15,887 research outputs found
Shrewd Selection Speeds Surfing: Use Smart EXP3!
In this paper, we explore the use of multi-armed bandit online learning
techniques to solve distributed resource selection problems. As an example, we
focus on the problem of network selection. Mobile devices often have several
wireless networks at their disposal. While choosing the right network is vital
for good performance, a decentralized solution remains a challenge. The
impressive theoretical properties of multi-armed bandit algorithms, like EXP3,
suggest that it should work well for this type of problem. Yet, its real-word
performance lags far behind. The main reasons are the hidden cost of switching
networks and its slow rate of convergence. We propose Smart EXP3, a novel
bandit-style algorithm that (a) retains the good theoretical properties of
EXP3, (b) bounds the number of switches, and (c) yields significantly better
performance in practice. We evaluate Smart EXP3 using simulations, controlled
experiments, and real-world experiments. Results show that it stabilizes at the
optimal state, achieves fairness among devices and gracefully deals with
transient behaviors. In real world experiments, it can achieve 18% faster
download over alternate strategies. We conclude that multi-armed bandit
algorithms can play an important role in distributed resource selection
problems, when practical concerns, such as switching costs and convergence
time, are addressed.Comment: Full pape
Overcoming Exploration in Reinforcement Learning with Demonstrations
Exploration in environments with sparse rewards has been a persistent problem
in reinforcement learning (RL). Many tasks are natural to specify with a sparse
reward, and manually shaping a reward function can result in suboptimal
performance. However, finding a non-zero reward is exponentially more difficult
with increasing task horizon or action dimensionality. This puts many
real-world tasks out of practical reach of RL methods. In this work, we use
demonstrations to overcome the exploration problem and successfully learn to
perform long-horizon, multi-step robotics tasks with continuous control such as
stacking blocks with a robot arm. Our method, which builds on top of Deep
Deterministic Policy Gradients and Hindsight Experience Replay, provides an
order of magnitude of speedup over RL on simulated robotics tasks. It is simple
to implement and makes only the additional assumption that we can collect a
small set of demonstrations. Furthermore, our method is able to solve tasks not
solvable by either RL or behavior cloning alone, and often ends up
outperforming the demonstrator policy.Comment: 8 pages, ICRA 201
MAX-consensus in open multi-agent systems with gossip interactions
We study the problem of distributed maximum computation in an open
multi-agent system, where agents can leave and arrive during the execution of
the algorithm. The main challenge comes from the possibility that the agent
holding the largest value leaves the system, which changes the value to be
computed. The algorithms must as a result be endowed with mechanisms allowing
to forget outdated information. The focus is on systems in which interactions
are pairwise gossips between randomly selected agents. We consider situations
where leaving agents can send a last message, and situations where they cannot.
For both cases, we provide algorithms able to eventually compute the maximum of
the values held by agents.Comment: To appear in the proceedings of the 56th IEEE Conference on Decision
and Control (CDC 17). 8 pages, 3 figure
Local Water Storage Control for the Developing World
Most cities in India do not have water distribution networks that provide
water throughout the entire day. As a result, it is common for homes and
apartment buildings to utilize water storage systems that are filled during a
small window of time in the day when the water distribution network is active.
However, these water storage systems do not have disinfection capabilities, and
so long durations of storage (i.e., as few as four days) of the same water
leads to substantial increases in the amount of bacteria and viruses in that
water. This paper considers the stochastic control problem of deciding how much
water to store each day in the system, as well as deciding when to completely
empty the water system, in order to tradeoff: the financial costs of the water,
the health costs implicit in long durations of storing the same water, the
potential for a shortfall in the quantity of stored versus demanded water, and
water wastage from emptying the system. To solve this problem, we develop a new
Binary Dynamic Search (BiDS) algorithm that is able to use binary search in one
dimension to compute the value function of stochastic optimal control problems
with controlled resets to a single state and with constraints on the maximum
time span in between resets of the system
Neural Pattern Recognition on Multichannel Input Representation
This article presents a new neural pattern recognition architecture on multichannel data representation. The architecture emploies generalized ART modules as building blocks to construct a supervised learning system generating recognition codes on channels dynamically selected in context using serial and parallel match trackings led by inter-ART vigilance signals.Sharp Corporation, Information Techology Research Laboratories, Nara, Japa
Distributed ART Networks for Learning, Recognition, and Prediction
Adaptive resonance theory (ART) models have been used for learning and prediction in a wide variety of applications. Winner-take-all coding allows these networks to maintain stable memories, but this type of code representation can cause problems such as category proliferation with fast learning and a noisy training set. A new class of ART models with an arbitrarily distributed code representation is outlined here. With winner-take-all coding, the unsupervised distributed ART model (dART) reduces to fuzzy ART and the supervised distributed ARTMAP model (dARTMAP) reduces to fuzzy ARTMAP. dART automatically apportions learned changes according to the degree of activation of each node, which permits fast as well as slow learning with compressed or distributed codes. Distributed ART models replace the traditional neural network path weight with a dynamic weight equal to the rectified difference between coding node activation and an adaptive threshold. Dynamic weights that project to coding nodes obey a distributed instar leaning law and those that originate from coding nodes obey a distributed outstar learning law. Inputs activate distributed codes through phasic and tonic signal components with dual computational properties, and a parallel distributed match-reset-search process helps stabilize memory.National Science Foundation (IRI 94-0 1659); Office of Naval Research (N00014-95-1-0409, N00014-95-0657
- …