15,887 research outputs found

    Shrewd Selection Speeds Surfing: Use Smart EXP3!

    Full text link
    In this paper, we explore the use of multi-armed bandit online learning techniques to solve distributed resource selection problems. As an example, we focus on the problem of network selection. Mobile devices often have several wireless networks at their disposal. While choosing the right network is vital for good performance, a decentralized solution remains a challenge. The impressive theoretical properties of multi-armed bandit algorithms, like EXP3, suggest that it should work well for this type of problem. Yet, its real-word performance lags far behind. The main reasons are the hidden cost of switching networks and its slow rate of convergence. We propose Smart EXP3, a novel bandit-style algorithm that (a) retains the good theoretical properties of EXP3, (b) bounds the number of switches, and (c) yields significantly better performance in practice. We evaluate Smart EXP3 using simulations, controlled experiments, and real-world experiments. Results show that it stabilizes at the optimal state, achieves fairness among devices and gracefully deals with transient behaviors. In real world experiments, it can achieve 18% faster download over alternate strategies. We conclude that multi-armed bandit algorithms can play an important role in distributed resource selection problems, when practical concerns, such as switching costs and convergence time, are addressed.Comment: Full pape

    Overcoming Exploration in Reinforcement Learning with Demonstrations

    Full text link
    Exploration in environments with sparse rewards has been a persistent problem in reinforcement learning (RL). Many tasks are natural to specify with a sparse reward, and manually shaping a reward function can result in suboptimal performance. However, finding a non-zero reward is exponentially more difficult with increasing task horizon or action dimensionality. This puts many real-world tasks out of practical reach of RL methods. In this work, we use demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm. Our method, which builds on top of Deep Deterministic Policy Gradients and Hindsight Experience Replay, provides an order of magnitude of speedup over RL on simulated robotics tasks. It is simple to implement and makes only the additional assumption that we can collect a small set of demonstrations. Furthermore, our method is able to solve tasks not solvable by either RL or behavior cloning alone, and often ends up outperforming the demonstrator policy.Comment: 8 pages, ICRA 201

    MAX-consensus in open multi-agent systems with gossip interactions

    Full text link
    We study the problem of distributed maximum computation in an open multi-agent system, where agents can leave and arrive during the execution of the algorithm. The main challenge comes from the possibility that the agent holding the largest value leaves the system, which changes the value to be computed. The algorithms must as a result be endowed with mechanisms allowing to forget outdated information. The focus is on systems in which interactions are pairwise gossips between randomly selected agents. We consider situations where leaving agents can send a last message, and situations where they cannot. For both cases, we provide algorithms able to eventually compute the maximum of the values held by agents.Comment: To appear in the proceedings of the 56th IEEE Conference on Decision and Control (CDC 17). 8 pages, 3 figure

    Local Water Storage Control for the Developing World

    Full text link
    Most cities in India do not have water distribution networks that provide water throughout the entire day. As a result, it is common for homes and apartment buildings to utilize water storage systems that are filled during a small window of time in the day when the water distribution network is active. However, these water storage systems do not have disinfection capabilities, and so long durations of storage (i.e., as few as four days) of the same water leads to substantial increases in the amount of bacteria and viruses in that water. This paper considers the stochastic control problem of deciding how much water to store each day in the system, as well as deciding when to completely empty the water system, in order to tradeoff: the financial costs of the water, the health costs implicit in long durations of storing the same water, the potential for a shortfall in the quantity of stored versus demanded water, and water wastage from emptying the system. To solve this problem, we develop a new Binary Dynamic Search (BiDS) algorithm that is able to use binary search in one dimension to compute the value function of stochastic optimal control problems with controlled resets to a single state and with constraints on the maximum time span in between resets of the system

    Neural Pattern Recognition on Multichannel Input Representation

    Full text link
    This article presents a new neural pattern recognition architecture on multichannel data representation. The architecture emploies generalized ART modules as building blocks to construct a supervised learning system generating recognition codes on channels dynamically selected in context using serial and parallel match trackings led by inter-ART vigilance signals.Sharp Corporation, Information Techology Research Laboratories, Nara, Japa

    Distributed ART Networks for Learning, Recognition, and Prediction

    Full text link
    Adaptive resonance theory (ART) models have been used for learning and prediction in a wide variety of applications. Winner-take-all coding allows these networks to maintain stable memories, but this type of code representation can cause problems such as category proliferation with fast learning and a noisy training set. A new class of ART models with an arbitrarily distributed code representation is outlined here. With winner-take-all coding, the unsupervised distributed ART model (dART) reduces to fuzzy ART and the supervised distributed ARTMAP model (dARTMAP) reduces to fuzzy ARTMAP. dART automatically apportions learned changes according to the degree of activation of each node, which permits fast as well as slow learning with compressed or distributed codes. Distributed ART models replace the traditional neural network path weight with a dynamic weight equal to the rectified difference between coding node activation and an adaptive threshold. Dynamic weights that project to coding nodes obey a distributed instar leaning law and those that originate from coding nodes obey a distributed outstar learning law. Inputs activate distributed codes through phasic and tonic signal components with dual computational properties, and a parallel distributed match-reset-search process helps stabilize memory.National Science Foundation (IRI 94-0 1659); Office of Naval Research (N00014-95-1-0409, N00014-95-0657
    • …
    corecore