59,457 research outputs found
A new Potential-Based Reward Shaping for Reinforcement Learning Agent
Potential-based reward shaping (PBRS) is a particular category of machine
learning methods which aims to improve the learning speed of a reinforcement
learning agent by extracting and utilizing extra knowledge while performing a
task. There are two steps in the process of transfer learning: extracting
knowledge from previously learned tasks and transferring that knowledge to use
it in a target task. The latter step is well discussed in the literature with
various methods being proposed for it, while the former has been explored less.
With this in mind, the type of knowledge that is transmitted is very important
and can lead to considerable improvement. Among the literature of both the
transfer learning and the potential-based reward shaping, a subject that has
never been addressed is the knowledge gathered during the learning process
itself. In this paper, we presented a novel potential-based reward shaping
method that attempted to extract knowledge from the learning process. The
proposed method extracts knowledge from episodes' cumulative rewards. The
proposed method has been evaluated in the Arcade learning environment and the
results indicate an improvement in the learning process in both the single-task
and the multi-task reinforcement learner agents
Whole-Chain Recommendations
With the recent prevalence of Reinforcement Learning (RL), there have been
tremendous interests in developing RL-based recommender systems. In practical
recommendation sessions, users will sequentially access multiple scenarios,
such as the entrance pages and the item detail pages, and each scenario has its
specific characteristics. However, the majority of existing RL-based
recommender systems focus on optimizing one strategy for all scenarios or
separately optimizing each strategy, which could lead to sub-optimal overall
performance. In this paper, we study the recommendation problem with multiple
(consecutive) scenarios, i.e., whole-chain recommendations. We propose a
multi-agent RL-based approach (DeepChain), which can capture the sequential
correlation among different scenarios and jointly optimize multiple
recommendation strategies. To be specific, all recommender agents (RAs) share
the same memory of users' historical behaviors, and they work collaboratively
to maximize the overall reward of a session. Note that optimizing multiple
recommendation strategies jointly faces two challenges in the existing
model-free RL model - (i) it requires huge amounts of user behavior data, and
(ii) the distribution of reward (users' feedback) are extremely unbalanced. In
this paper, we introduce model-based RL techniques to reduce the training data
requirement and execute more accurate strategy updates. The experimental
results based on a real e-commerce platform demonstrate the effectiveness of
the proposed framework.Comment: 29th ACM International Conference on Information and Knowledge
Managemen
Knowledge Transfer Between Robots with Similar Dynamics for High-Accuracy Impromptu Trajectory Tracking
In this paper, we propose an online learning approach that enables the
inverse dynamics model learned for a source robot to be transferred to a target
robot (e.g., from one quadrotor to another quadrotor with different mass or
aerodynamic properties). The goal is to leverage knowledge from the source
robot such that the target robot achieves high-accuracy trajectory tracking on
arbitrary trajectories from the first attempt with minimal data recollection
and training. Most existing approaches for multi-robot knowledge transfer are
based on post-analysis of datasets collected from both robots. In this work, we
study the feasibility of impromptu transfer of models across robots by learning
an error prediction module online. In particular, we analytically derive the
form of the mapping to be learned by the online module for exact tracking,
propose an approach for characterizing similarity between robots, and use these
results to analyze the stability of the overall system. The proposed approach
is illustrated in simulation and verified experimentally on two different
quadrotors performing impromptu trajectory tracking tasks, where the quadrotors
are required to accurately track arbitrary hand-drawn trajectories from the
first attempt.Comment: European Control Conference (ECC) 201
- …