66,067 research outputs found
Deep Reinforcement Learning with Feedback-based Exploration
Deep Reinforcement Learning has enabled the control of increasingly complex
and high-dimensional problems. However, the need of vast amounts of data before
reasonable performance is attained prevents its widespread application. We
employ binary corrective feedback as a general and intuitive manner to
incorporate human intuition and domain knowledge in model-free machine
learning. The uncertainty in the policy and the corrective feedback is combined
directly in the action space as probabilistic conditional exploration. As a
result, the greatest part of the otherwise ignorant learning process can be
avoided. We demonstrate the proposed method, Predictive Probabilistic Merging
of Policies (PPMP), in combination with DDPG. In experiments on continuous
control problems of the OpenAI Gym, we achieve drastic improvements in sample
efficiency, final performance, and robustness to erroneous feedback, both for
human and synthetic feedback. Additionally, we show solutions beyond the
demonstrated knowledge.Comment: 6 page
The Dreaming Variational Autoencoder for Reinforcement Learning Environments
Reinforcement learning has shown great potential in generalizing over raw
sensory data using only a single neural network for value optimization. There
are several challenges in the current state-of-the-art reinforcement learning
algorithms that prevent them from converging towards the global optima. It is
likely that the solution to these problems lies in short- and long-term
planning, exploration and memory management for reinforcement learning
algorithms. Games are often used to benchmark reinforcement learning algorithms
as they provide a flexible, reproducible, and easy to control environment.
Regardless, few games feature a state-space where results in exploration,
memory, and planning are easily perceived. This paper presents The Dreaming
Variational Autoencoder (DVAE), a neural network based generative modeling
architecture for exploration in environments with sparse feedback. We further
present Deep Maze, a novel and flexible maze engine that challenges DVAE in
partial and fully-observable state-spaces, long-horizon tasks, and
deterministic and stochastic problems. We show initial findings and encourage
further work in reinforcement learning driven by generative exploration.Comment: Best Student Paper Award, Proceedings of the 38th SGAI International
Conference on Artificial Intelligence, Cambridge, UK, 2018, Artificial
Intelligence XXXV, 201
Reinforcement Learning based NLP
In the field of Natural Language Processing (NLP), reinforcement learning (RL) has drawn attention as a viable method for training models. An agent is trained to interact with a linguistic environment in order to carry out a given task using RL- based NLP, and the agent learns from feedback in the form of rewards or penalties. This method has been effectively used for a variety of linguistic problems, including text summarization, conversation systems, and machine translation. Sequence-to- sequence Two common methods used in RL-based NLP are reinforcement learning and deep reinforcement learning. Sequence-to-sequence While deep reinforcement learning includes training a neural network to discover the optimum strategy for a language challenge, reinforcement learning (RL) trains a model to create a series of words or characters that most closely matches a goal sequence. In several linguistic challenges, RL-based NLP has demonstrated promising results and attained cutting-edge performance. There are still issues to be solved, such as the need for more effective exploration tactics, data scarcity, and sample efficiency. In summary, RL-based NLP represents a potential line of inquiry for NLP research in the future. This method outperforms more established NLP strategies in a variety of language problems and has the added benefit of being able to improve over time with user feedback. To further enhance RL-based NLP's effectiveness and increase its applicability to real-world settings, future research should concentrate on resolving the difficulties associated with this approach.Published By: Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP)
© Copyright: All rights reserved
The Dreaming Variational Autoencoder for Reinforcement Learning Environments
Reinforcement learning has shown great potential in generalizing over raw sensory data using only a single neural network for value optimization. There are several challenges in the current state-of-the-art reinforcement learning algorithms that prevent them from converging towards the global optima. It is likely that the solution to these problems lies in short- and long-term planning, exploration and memory management for reinforcement learning algorithms. Games are often used to benchmark reinforcement learning algorithms as they provide a flexible, reproducible, and easy to control environment. Regardless, few games feature a state-space where results in exploration, memory, and planning are easily perceived. This paper presents The Dreaming Variational Autoencoder (DVAE), a neural network based generative modeling architecture for exploration in environments with sparse feedback. We further present Deep Maze, a novel and flexible maze engine that challenges DVAE in partial and fully-observable state-spaces, long-horizon tasks, and deterministic and stochastic problems. We show initial findings and encourage further work in reinforcement learning driven by generative exploration.The Dreaming Variational Autoencoder for Reinforcement Learning EnvironmentsacceptedVersionNivå
Stabilizing reinforcement learning control: A modular framework for optimizing over all stable behavior
We propose a framework for the design of feedback controllers that combines
the optimization-driven and model-free advantages of deep reinforcement
learning with the stability guarantees provided by using the Youla-Kucera
parameterization to define the search domain. Recent advances in behavioral
systems allow us to construct a data-driven internal model; this enables an
alternative realization of the Youla-Kucera parameterization based entirely on
input-output exploration data. Perhaps of independent interest, we formulate
and analyze the stability of such data-driven models in the presence of noise.
The Youla-Kucera approach requires a stable "parameter" for controller design.
For the training of reinforcement learning agents, the set of all stable linear
operators is given explicitly through a matrix factorization approach.
Moreover, a nonlinear extension is given using a neural network to express a
parameterized set of stable operators, which enables seamless integration with
standard deep learning libraries. Finally, we show how these ideas can also be
applied to tune fixed-structure controllers.Comment: Preprint; 18 pages. arXiv admin note: text overlap with
arXiv:2304.0342
Rearrangement with Nonprehensile Manipulation Using Deep Reinforcement Learning
Rearranging objects on a tabletop surface by means of nonprehensile
manipulation is a task which requires skillful interaction with the physical
world. Usually, this is achieved by precisely modeling physical properties of
the objects, robot, and the environment for explicit planning. In contrast, as
explicitly modeling the physical environment is not always feasible and
involves various uncertainties, we learn a nonprehensile rearrangement strategy
with deep reinforcement learning based on only visual feedback. For this, we
model the task with rewards and train a deep Q-network. Our potential
field-based heuristic exploration strategy reduces the amount of collisions
which lead to suboptimal outcomes and we actively balance the training set to
avoid bias towards poor examples. Our training process leads to quicker
learning and better performance on the task as compared to uniform exploration
and standard experience replay. We demonstrate empirical evidence from
simulation that our method leads to a success rate of 85%, show that our system
can cope with sudden changes of the environment, and compare our performance
with human level performance.Comment: 2018 International Conference on Robotics and Automatio
Impact Makes a Sound and Sound Makes an Impact: Sound Guides Representations and Explorations
Sound is one of the most informative and abundant modalities in the real
world while being robust to sense without contacts by small and cheap sensors
that can be placed on mobile devices. Although deep learning is capable of
extracting information from multiple sensory inputs, there has been little use
of sound for the control and learning of robotic actions. For unsupervised
reinforcement learning, an agent is expected to actively collect experiences
and jointly learn representations and policies in a self-supervised way. We
build realistic robotic manipulation scenarios with physics-based sound
simulation and propose the Intrinsic Sound Curiosity Module (ISCM). The ISCM
provides feedback to a reinforcement learner to learn robust representations
and to reward a more efficient exploration behavior. We perform experiments
with sound enabled during pre-training and disabled during adaptation, and show
that representations learned by ISCM outperform the ones by vision-only
baselines and pre-trained policies can accelerate the learning process when
applied to downstream tasks.Comment: Accepted at IROS 202
Explore, Exploit or Listen: Combining Human Feedback and Policy Model to Speed up Deep Reinforcement Learning in 3D Worlds
We describe a method to use discrete human feedback to enhance the
performance of deep learning agents in virtual three-dimensional environments
by extending deep-reinforcement learning to model the confidence and
consistency of human feedback. This enables deep reinforcement learning
algorithms to determine the most appropriate time to listen to the human
feedback, exploit the current policy model, or explore the agent's environment.
Managing the trade-off between these three strategies allows DRL agents to be
robust to inconsistent or intermittent human feedback. Through experimentation
using a synthetic oracle, we show that our technique improves the training
speed and overall performance of deep reinforcement learning in navigating
three-dimensional environments using Minecraft. We further show that our
technique is robust to highly innacurate human feedback and can also operate
when no human feedback is given
- …