12,091 research outputs found
Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics
Value-based reinforcement-learning algorithms provide state-of-the-art
results in model-free discrete-action settings, and tend to outperform
actor-critic algorithms. We argue that actor-critic algorithms are limited by
their need for an on-policy critic. We propose Bootstrapped Dual Policy
Iteration (BDPI), a novel model-free reinforcement-learning algorithm for
continuous states and discrete actions, with an actor and several off-policy
critics. Off-policy critics are compatible with experience replay, ensuring
high sample-efficiency, without the need for off-policy corrections. The actor,
by slowly imitating the average greedy policy of the critics, leads to
high-quality and state-specific exploration, which we compare to Thompson
sampling. Because the actor and critics are fully decoupled, BDPI is remarkably
stable, and unusually robust to its hyper-parameters. BDPI is significantly
more sample-efficient than Bootstrapped DQN, PPO, and ACKTR, on discrete,
continuous and pixel-based tasks. Source code:
https://github.com/vub-ai-lab/bdpi.Comment: Accepted at the European Conference on Machine Learning 2019 (ECML
Two-Timescale Learning Using Idiotypic Behaviour Mediation For A Navigating Mobile Robot
A combined Short-Term Learning (STL) and Long-Term Learning (LTL) approach to
solving mobile-robot navigation problems is presented and tested in both the
real and virtual domains. The LTL phase consists of rapid simulations that use
a Genetic Algorithm to derive diverse sets of behaviours, encoded as variable
sets of attributes, and the STL phase is an idiotypic Artificial Immune System.
Results from the LTL phase show that sets of behaviours develop very rapidly,
and significantly greater diversity is obtained when multiple autonomous
populations are used, rather than a single one. The architecture is assessed
under various scenarios, including removal of the LTL phase and switching off
the idiotypic mechanism in the STL phase. The comparisons provide substantial
evidence that the best option is the inclusion of both the LTL phase and the
idiotypic system. In addition, this paper shows that structurally different
environments can be used for the two phases without compromising
transferability.Comment: 40 pages, 12 tables, Journal of Applied Soft Computin
Skill learning and the evolution of social learning mechanisms
This research was supported by a grant from The John Templeton Foundation.Background. Social learning is potentially advantageous, but evolutionary theory predicts that (i) its benefits may be self-limiting because social learning can lead to information parasitism, and (ii) these limitations can be mitigated via forms of selective copying. However, these findings arise from a functional approach in which learning mechanisms are not specified, and which assumes that social learning avoids the costs of asocial learning but does not produce information about the environment. Whether these findings generalize to all kinds of social learning remains to be established. Using a detailed multi-scale evolutionary model, we investigate the payoffs and information production processes of specific social learning mechanisms (including local enhancement, stimulus enhancement and observational learning) and their evolutionary consequences in the context of skill learning in foraging groups. Results. We find that local enhancement does not benefit foraging success, but could evolve as a side-effect of grouping. In contrast, stimulus enhancement and observational learning can be beneficial across a wide range of environmental conditions because they generate opportunities for new learning outcomes. Conclusions. In contrast to much existing theory, we find that the functional outcomes of social learning are mechanism specific. Social learning nearly always produces information about the environment, and does not always avoid the costs of asocial learning or support information parasitism. Our study supports work emphasizing the value of incorporating mechanistic detail in functional analyses.Publisher PDFPeer reviewe
Deep Reinforcement Learning Approach for Lagrangian Control: Improving Freeway Bottleneck Throughput Via Variable Speed Limit
Connected vehicles (CVs) will enable new applications to improve traffic flow. The focus of this dissertation is to investigate how reinforcement learning (RL) control for the variable speed limit (VSL) through CVs can be generalized to improve traffic flow at different freeway bottlenecks. Three different bottlenecks are investigated: A sag curve, where the gradient changes from negative to positive values causes a reduction in the roadway capacity and congestion; a lane reduction, where three lanes merge to two lanes and cause congestion, and finally, an on-ramp, where increase in demand on a multilane freeway causes capacity drop. An RL algorithm is developed and implemented in a simulation environment for controlling a VSL in the upstream to manipulate the inflow of vehicles to the bottleneck on a freeway to minimize delays and increase the throughput. CVs are assumed to receive VSL messages through Infrastructure-to-Vehicle (I2V) communications technologies. Asynchronous Advantage Actor-Critic (A3C) algorithms are developed for each bottleneck to determine optimal VSL policies. Through these RL control algorithms, the speed of CVs are manipulated in the upstream of the bottleneck to avoid or minimize congestion. Various market penetration rates for CVs are considered in the simulations. It is demonstrated that the RL algorithm is able to adapt to stochastic arrivals of CVs and achieve significant improvements even at low market penetration rates of CVs, and the RL algorithm is able to find solution for all three bottlenecks. The results also show that the RL-based solutions outperform feedback-control-based solutions
- …