6 research outputs found

    Game of Thrones: Fully Distributed Learning for Multi-Player Bandits

    Full text link
    We consider a multi-armed bandit game where N players compete for M arms for T turns. Each player has different expected rewards for the arms, and the instantaneous rewards are independent and identically distributed or Markovian. When two or more players choose the same arm, they all receive zero reward. Performance is measured using the expected sum of regrets, compared to optimal assignment of arms to players. We assume that each player only knows her actions and the reward she received each turn. Players cannot observe the actions of other players, and no communication between players is possible. We present a distributed algorithm and prove that it achieves an expected sum of regrets of near-O\left(\log T\right). This is the first algorithm to achieve a near order optimal regret in this fully distributed scenario. All other works have assumed that either all players have the same vector of expected rewards or that communication between players is possible.Comment: A preliminary version was accepted to NIPS 2018. This extended paper, currently under review (submitted in September 2019), improves the regret bound to near-log(T), generalizes to unbounded and Markovian rewards and has a much better convergence rat

    Sequential Decision Making under Uncertainty for Sensor Management in Mobile Robotics

    Get PDF
    Sensor management refers to the control of the degrees of freedom in a sensing system. The objective of sensor management is to improve performance e.g. by obtaining more accurate information or by achieving other operational goals. Sensor management is viewed as a sequential decision making process, where decisions at any time are made conditional on the past decisions and measurement data. At the time of deciding a control action for a sensing system the measurement data that will be obtained are unknown. Thus, informally speaking, a solution to a sensor management problem is a policy that determines which sensing action to undertake given the current information on the state of the process under investigation and contingent on any possible realisation of future measurement data outcomes.This thesis studies sensor management framing the contingent planning problem in the partially observable Markov decision process (POMDP) framework. In particular, applications in mobile robotics are considered. Mobile robots are viewed as controllable sensor platforms.Based on earlier work on POMDP based robot control, and distinguishing between the two cases of either exploiting or gathering information, we define four canonical sensor management problem types in mobile robotics. In each of the problem types, we exploit the structural properties of their inputs to improve efficiency of applicable contingent planning algorithms.In particular, we consider sensor management problems for information gathering where the utility of the possible control policies is quantified by mutual information (MI). We identify the relationship between the POMDP formulation of an environment monitoring problem and another contingent planning problem known as a multi-armed bandit (MAB). In a robotic exploration task, we derive a novel approximation for MI.Through both simulation and real-world experiments in mobile robotics domains, we determine the applicability, advantages, and disadvantages of a POMDP based approach to sensor management in mobile robotics
    corecore