88,241 research outputs found
CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning
In open-ended environments, autonomous learning agents must set their own
goals and build their own curriculum through an intrinsically motivated
exploration. They may consider a large diversity of goals, aiming to discover
what is controllable in their environments, and what is not. Because some goals
might prove easy and some impossible, agents must actively select which goal to
practice at any moment, to maximize their overall mastery on the set of
learnable goals. This paper proposes CURIOUS, an algorithm that leverages 1) a
modular Universal Value Function Approximator with hindsight learning to
achieve a diversity of goals of different kinds within a unique policy and 2)
an automated curriculum learning mechanism that biases the attention of the
agent towards goals maximizing the absolute learning progress. Agents focus
sequentially on goals of increasing complexity, and focus back on goals that
are being forgotten. Experiments conducted in a new modular-goal robotic
environment show the resulting developmental self-organization of a learning
curriculum, and demonstrate properties of robustness to distracting goals,
forgetting and changes in body properties.Comment: Accepted at ICML 201
Probabilistic movement primitives
Movement Primitives (MP) are a well-established approach for representing modular
and re-usable robot movement generators. Many state-of-the-art robot learning
successes are based MPs, due to their compact representation of the inherently
continuous and high dimensional robot movements. A major goal in robot learning
is to combine multiple MPs as building blocks in a modular control architecture
to solve complex tasks. To this effect, a MP representation has to allow for
blending between motions, adapting to altered task variables, and co-activating
multiple MPs in parallel. We present a probabilistic formulation of the MP concept
that maintains a distribution over trajectories. Our probabilistic approach
allows for the derivation of new operations which are essential for implementing
all aforementioned properties in one framework. In order to use such a trajectory
distribution for robot movement control, we analytically derive a stochastic feedback
controller which reproduces the given trajectory distribution. We evaluate
and compare our approach to existing methods on several simulated as well as
real robot scenarios
Safety-Critical Learning of Robot Control with Temporal Logic Specifications
Reinforcement learning (RL) is a promising approach. However, success is
limited to real-world applications, because ensuring safe exploration and
facilitating adequate exploitation is a challenge for controlling robotic
systems with unknown models and measurement uncertainties. The learning problem
becomes even more difficult for complex tasks over continuous state-action. In
this paper, we propose a learning-based robotic control framework consisting of
several aspects: (1) we leverage Linear Temporal Logic (LTL) to express complex
tasks over infinite horizons that are translated to a novel automaton
structure; (2) we detail an innovative reward scheme for LTL satisfaction with
a probabilistic guarantee. Then, by applying a reward shaping technique, we
develop a modular policy-gradient architecture exploiting the benefits of the
automaton structure to decompose overall tasks and enhance the performance of
learned controllers; (3) by incorporating Gaussian Processes (GPs) to estimate
the uncertain dynamic systems, we synthesize a model-based safe exploration
during the learning process using Exponential Control Barrier Functions (ECBFs)
that generalize systems with high-order relative degrees; (4) to further
improve the efficiency of exploration, we utilize the properties of LTL
automata and ECBFs to propose a safe guiding process. Finally, we demonstrate
the effectiveness of the framework via several robotic environments. We show an
ECBF-based modular deep RL algorithm that achieves near-perfect success rates
and safety guarding with high probability confidence during training.Comment: Under Review. arXiv admin note: text overlap with arXiv:2102.1285
Recommended from our members
Towards Informed Exploration for Deep Reinforcement Learning
In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact
- …