18,605 research outputs found
Policy Optimization with Model-based Explorations
Model-free reinforcement learning methods such as the Proximal Policy
Optimization algorithm (PPO) have successfully applied in complex
decision-making problems such as Atari games. However, these methods suffer
from high variances and high sample complexity. On the other hand, model-based
reinforcement learning methods that learn the transition dynamics are more
sample efficient, but they often suffer from the bias of the transition
estimation. How to make use of both model-based and model-free learning is a
central problem in reinforcement learning. In this paper, we present a new
technique to address the trade-off between exploration and exploitation, which
regards the difference between model-free and model-based estimations as a
measure of exploration value. We apply this new technique to the PPO algorithm
and arrive at a new policy optimization method, named Policy Optimization with
Model-based Explorations (POME). POME uses two components to predict the
actions' target values: a model-free one estimated by Monte-Carlo sampling and
a model-based one which learns a transition model and predicts the value of the
next state. POME adds the error of these two target estimations as the
additional exploration value for each state-action pair, i.e, encourages the
algorithm to explore the states with larger target errors which are hard to
estimate. We compare POME with PPO on Atari 2600 games, and it shows that POME
outperforms PPO on 33 games out of 49 games.Comment: Accepted at AAAI-1
Efficient Deep Reinforcement Learning via Adaptive Policy Transfer
Transfer Learning (TL) has shown great potential to accelerate Reinforcement
Learning (RL) by leveraging prior knowledge from past learned policies of
relevant tasks. Existing transfer approaches either explicitly computes the
similarity between tasks or select appropriate source policies to provide
guided explorations for the target task. However, how to directly optimize the
target policy by alternatively utilizing knowledge from appropriate source
policies without explicitly measuring the similarity is currently missing. In
this paper, we propose a novel Policy Transfer Framework (PTF) to accelerate RL
by taking advantage of this idea. Our framework learns when and which source
policy is the best to reuse for the target policy and when to terminate it by
modeling multi-policy transfer as the option learning problem. PTF can be
easily combined with existing deep RL approaches. Experimental results show it
significantly accelerates the learning process and surpasses state-of-the-art
policy transfer methods in terms of learning efficiency and final performance
in both discrete and continuous action spaces.Comment: Accepted by IJCAI'202
A reinforcement learning based decision support system in textile manufacturing process
This paper introduced a reinforcement learning based decision support system
in textile manufacturing process. A solution optimization problem of color
fading ozonation is discussed and set up as a Markov Decision Process (MDP) in
terms of tuple {S, A, P, R}. Q-learning is used to train an agent in the
interaction with the setup environment by accumulating the reward R. According
to the application result, it is found that the proposed MDP model has well
expressed the optimization problem of textile manufacturing process discussed
in this paper, therefore the use of reinforcement learning to support decision
making in this sector is conducted and proven that is applicable with promising
prospects
- …