Search CORE

51,372 research outputs found

Reuse of Neural Modules for General Video Game Playing

Author: Braylan Alexander
Hollenbeck Mark
Meyerson Elliot
Miikkulainen Risto
Publication venue
Publication date: 04/12/2015
Field of study

A general approach to knowledge transfer is introduced in which an agent controlled by a neural network adapts how it reuses existing networks as it learns in a new domain. Networks trained for a new domain can improve their performance by routing activation selectively through previously learned neural structure, regardless of how or for what it was learned. A neuroevolution implementation of this approach is presented with application to high-dimensional sequential decision-making domains. This approach is more general than previous approaches to neural transfer for reinforcement learning. It is domain-agnostic and requires no prior assumptions about the nature of task relatedness or mappings. The method is analyzed in a stochastic version of the Arcade Learning Environment, demonstrating that it improves performance in some of the more complex Atari 2600 games, and that the success of transfer can be predicted based on a high-level characterization of game dynamics.Comment: Accepted at AAAI 1

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

CoRide: Joint Order Dispatching and Fleet Management for Multi-Scale Ride-Hailing Platforms

Author: Guo Zilong
Jiao Yan
Jin Jiarui
Li Minne
Qin Zhiwei
Tang Xiaocheng
Wang Chenxi
Wang Jun
Wu Guobin
Ye Jieping
Zhang Weinan
Zhou Ming
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/08/2019
Field of study

How to optimally dispatch orders to vehicles and how to tradeoff between immediate and future returns are fundamental questions for a typical ride-hailing platform. We model ride-hailing as a large-scale parallel ranking problem and study the joint decision-making task of order dispatching and fleet management in online ride-hailing platforms. This task brings unique challenges in the following four aspects. First, to facilitate a huge number of vehicles to act and learn efficiently and robustly, we treat each region cell as an agent and build a multi-agent reinforcement learning framework. Second, to coordinate the agents from different regions to achieve long-term benefits, we leverage the geographical hierarchy of the region grids to perform hierarchical reinforcement learning. Third, to deal with the heterogeneous and variant action space for joint order dispatching and fleet management, we design the action as the ranking weight vector to rank and select the specific order or the fleet management destination in a unified formulation. Fourth, to achieve the multi-scale ride-hailing platform, we conduct the decision-making process in a hierarchical way where a multi-head attention mechanism is utilized to incorporate the impacts of neighbor agents and capture the key agent in each scale. The whole novel framework is named as CoRide. Extensive experiments based on multiple cities real-world data as well as analytic synthetic data demonstrate that CoRide provides superior performance in terms of platform revenue and user experience in the task of city-wide hybrid order dispatching and fleet management over strong baselines.Comment: CIKM 201

arXiv.org e-Print Archive

Crossref

UCL Discovery

Dopamine restores reward prediction errors in old age.

Senescence affects the ability to utilize information about the likelihood of rewards for optimal decision-making. Using functional magnetic resonance imaging in humans, we found that healthy older adults had an abnormal signature of expected value, resulting in an incomplete reward prediction error (RPE) signal in the nucleus accumbens, a brain region that receives rich input projections from substantia nigra/ventral tegmental area (SN/VTA) dopaminergic neurons. Structural connectivity between SN/VTA and striatum, measured by diffusion tensor imaging, was tightly coupled to inter-individual differences in the expression of this expected reward value signal. The dopamine precursor levodopa (L-DOPA) increased the task-based learning rate and task performance in some older adults to the level of young adults. This drug effect was linked to restoration of a canonical neural RPE. Our results identify a neurochemical signature underlying abnormal reward processing in older adults and indicate that this can be modulated by L-DOPA

St George's Online Research Archive

MPG.PuRe