2 research outputs found
Exploiting Symmetry and Heuristic Demonstrations in Off-policy Reinforcement Learning for Robotic Manipulation
Reinforcement learning demonstrates significant potential in automatically
building control policies in numerous domains, but shows low efficiency when
applied to robot manipulation tasks due to the curse of dimensionality. To
facilitate the learning of such tasks, prior knowledge or heuristics that
incorporate inherent simplification can effectively improve the learning
performance. This paper aims to define and incorporate the natural symmetry
present in physical robotic environments. Then, sample-efficient policies are
trained by exploiting the expert demonstrations in symmetrical environments
through an amalgamation of reinforcement and behavior cloning, which gives the
off-policy learning process a diverse yet compact initiation. Furthermore, it
presents a rigorous framework for a recent concept and explores its scope for
robot manipulation tasks. The proposed method is validated via two
point-to-point reaching tasks of an industrial arm, with and without an
obstacle, in a simulation experiment study. A PID controller, which tracks the
linear joint-space trajectories with hard-coded temporal logic to produce
interim midpoints, is used to generate demonstrations in the study. The results
of the study present the effect of the number of demonstrations and quantify
the magnitude of behavior cloning to exemplify the possible improvement of
model-free reinforcement learning in common manipulation tasks. A comparison
study between the proposed method and a traditional off-policy reinforcement
learning algorithm indicates its advantage in learning performance and
potential value for applications
Exploiting Intrinsic Stochasticity of Real-Time Simulation to Facilitate Robust Reinforcement Learning for Robot Manipulation
Simulation is essential to reinforcement learning (RL) before implementation
in the real world, especially for safety-critical applications like robot
manipulation. Conventionally, RL agents are sensitive to the discrepancies
between the simulation and the real world, known as the sim-to-real gap. The
application of domain randomization, a technique used to fill this gap, is
limited to the imposition of heuristic-randomized models. We investigate the
properties of intrinsic stochasticity of real-time simulation (RT-IS) of
off-the-shelf simulation software and its potential to improve the robustness
of RL methods and the performance of domain randomization. Firstly, we conduct
analytical studies to measure the correlation of RT-IS with the occupation of
the computer hardware and validate its comparability with the natural
stochasticity of a physical robot. Then, we apply the RT-IS feature in the
training of an RL agent. The simulation and physical experiment results verify
the feasibility and applicability of RT-IS to robust RL agent design for robot
manipulation tasks. The RT-IS-powered robust RL agent outperforms conventional
RL agents on robots with modeling uncertainties. It requires fewer heuristic
randomization and achieves better generalizability than the conventional
domain-randomization-powered agents. Our findings provide a new perspective on
the sim-to-real problem in practical applications like robot manipulation
tasks