51,372 research outputs found
Reuse of Neural Modules for General Video Game Playing
A general approach to knowledge transfer is introduced in which an agent
controlled by a neural network adapts how it reuses existing networks as it
learns in a new domain. Networks trained for a new domain can improve their
performance by routing activation selectively through previously learned neural
structure, regardless of how or for what it was learned. A neuroevolution
implementation of this approach is presented with application to
high-dimensional sequential decision-making domains. This approach is more
general than previous approaches to neural transfer for reinforcement learning.
It is domain-agnostic and requires no prior assumptions about the nature of
task relatedness or mappings. The method is analyzed in a stochastic version of
the Arcade Learning Environment, demonstrating that it improves performance in
some of the more complex Atari 2600 games, and that the success of transfer can
be predicted based on a high-level characterization of game dynamics.Comment: Accepted at AAAI 1
CoRide: Joint Order Dispatching and Fleet Management for Multi-Scale Ride-Hailing Platforms
How to optimally dispatch orders to vehicles and how to tradeoff between
immediate and future returns are fundamental questions for a typical
ride-hailing platform. We model ride-hailing as a large-scale parallel ranking
problem and study the joint decision-making task of order dispatching and fleet
management in online ride-hailing platforms. This task brings unique challenges
in the following four aspects. First, to facilitate a huge number of vehicles
to act and learn efficiently and robustly, we treat each region cell as an
agent and build a multi-agent reinforcement learning framework. Second, to
coordinate the agents from different regions to achieve long-term benefits, we
leverage the geographical hierarchy of the region grids to perform hierarchical
reinforcement learning. Third, to deal with the heterogeneous and variant
action space for joint order dispatching and fleet management, we design the
action as the ranking weight vector to rank and select the specific order or
the fleet management destination in a unified formulation. Fourth, to achieve
the multi-scale ride-hailing platform, we conduct the decision-making process
in a hierarchical way where a multi-head attention mechanism is utilized to
incorporate the impacts of neighbor agents and capture the key agent in each
scale. The whole novel framework is named as CoRide. Extensive experiments
based on multiple cities real-world data as well as analytic synthetic data
demonstrate that CoRide provides superior performance in terms of platform
revenue and user experience in the task of city-wide hybrid order dispatching
and fleet management over strong baselines.Comment: CIKM 201
Dopamine restores reward prediction errors in old age.
Senescence affects the ability to utilize information about the likelihood of rewards for optimal decision-making. Using functional magnetic resonance imaging in humans, we found that healthy older adults had an abnormal signature of expected value, resulting in an incomplete reward prediction error (RPE) signal in the nucleus accumbens, a brain region that receives rich input projections from substantia nigra/ventral tegmental area (SN/VTA) dopaminergic neurons. Structural connectivity between SN/VTA and striatum, measured by diffusion tensor imaging, was tightly coupled to inter-individual differences in the expression of this expected reward value signal. The dopamine precursor levodopa (L-DOPA) increased the task-based learning rate and task performance in some older adults to the level of young adults. This drug effect was linked to restoration of a canonical neural RPE. Our results identify a neurochemical signature underlying abnormal reward processing in older adults and indicate that this can be modulated by L-DOPA
- …