79 research outputs found
Eligibility Propagation to Speed up Time Hopping for Reinforcement Learning
A mechanism called Eligibility Propagation is proposed to speed up the Time
Hopping technique used for faster Reinforcement Learning in simulations.
Eligibility Propagation provides for Time Hopping similar abilities to what
eligibility traces provide for conventional Reinforcement Learning. It
propagates values from one state to all of its temporal predecessors using a
state transitions graph. Experiments on a simulated biped crawling robot
confirm that Eligibility Propagation accelerates the learning process more than
3 times.Comment: 7 page
Fast and Continuous Foothold Adaptation for Dynamic Locomotion through CNNs
Legged robots can outperform wheeled machines for most navigation tasks
across unknown and rough terrains. For such tasks, visual feedback is a
fundamental asset to provide robots with terrain-awareness. However, robust
dynamic locomotion on difficult terrains with real-time performance guarantees
remains a challenge. We present here a real-time, dynamic foothold adaptation
strategy based on visual feedback. Our method adjusts the landing position of
the feet in a fully reactive manner, using only on-board computers and sensors.
The correction is computed and executed continuously along the swing phase
trajectory of each leg. To efficiently adapt the landing position, we implement
a self-supervised foothold classifier based on a Convolutional Neural Network
(CNN). Our method results in an up to 200 times faster computation with respect
to the full-blown heuristics. Our goal is to react to visual stimuli from the
environment, bridging the gap between blind reactive locomotion and purely
vision-based planning strategies. We assess the performance of our method on
the dynamic quadruped robot HyQ, executing static and dynamic gaits (at speeds
up to 0.5 m/s) in both simulated and real scenarios; the benefit of safe
foothold adaptation is clearly demonstrated by the overall robot behavior.Comment: 9 pages, 11 figures. Accepted to RA-L + ICRA 2019, January 201
Learning Image-Conditioned Dynamics Models for Control of Under-actuated Legged Millirobots
Millirobots are a promising robotic platform for many applications due to
their small size and low manufacturing costs. Legged millirobots, in
particular, can provide increased mobility in complex environments and improved
scaling of obstacles. However, controlling these small, highly dynamic, and
underactuated legged systems is difficult. Hand-engineered controllers can
sometimes control these legged millirobots, but they have difficulties with
dynamic maneuvers and complex terrains. We present an approach for controlling
a real-world legged millirobot that is based on learned neural network models.
Using less than 17 minutes of data, our method can learn a predictive model of
the robot's dynamics that can enable effective gaits to be synthesized on the
fly for following user-specified waypoints on a given terrain. Furthermore, by
leveraging expressive, high-capacity neural network models, our approach allows
for these predictions to be directly conditioned on camera images, endowing the
robot with the ability to predict how different terrains might affect its
dynamics. This enables sample-efficient and effective learning for locomotion
of a dynamic legged millirobot on various terrains, including gravel, turf,
carpet, and styrofoam. Experiment videos can be found at
https://sites.google.com/view/imageconddy
Vision-based reinforcement learning using approximate policy iteration
A major issue for reinforcement learning (RL) applied to robotics is the time required to learn a new skill. While RL has been used to learn mobile robot control in many simulated domains, applications involving learning on real
robots are still relatively rare. In this paper, the Least-Squares Policy Iteration (LSPI) reinforcement learning algorithm and a new model-based algorithm Least-Squares Policy Iteration with Prioritized Sweeping (LSPI+), are implemented on a mobile robot to acquire new skills quickly and efficiently. LSPI+ combines the benefits of LSPI and prioritized sweeping, which uses all previous experience to focus the computational effort on the most āinterestingā or dynamic parts of the state space.
The proposed algorithms are tested on a household vacuum
cleaner robot for learning a docking task using vision as the only sensor modality. In experiments these algorithms are compared to other model-based and model-free RL algorithms. The results show that the number of trials required to learn the docking task is significantly reduced using LSPI compared to the other RL algorithms investigated, and that LSPI+ further improves on the performance of LSPI
Fault Tolerant Free Gait and Footstep Planning for Hexapod Robot Based on Monte-Carlo Tree
Legged robots can pass through complex field environments by selecting gaits
and discrete footholds carefully. Traditional methods plan gait and foothold
separately and treat them as the single-step optimal process. However, such
processing causes its poor passability in a sparse foothold environment. This
paper novelly proposes a coordinative planning method for hexapod robots that
regards the planning of gait and foothold as a sequence optimization problem
with the consideration of dealing with the harshness of the environment as leg
fault. The Monte Carlo tree search algorithm(MCTS) is used to optimize the
entire sequence. Two methods, FastMCTS, and SlidingMCTS are proposed to solve
some defeats of the standard MCTS applicating in the field of legged robot
planning. The proposed planning algorithm combines the fault-tolerant gait
method to improve the passability of the algorithm. Finally, compared with
other planning methods, experiments on terrains with different densities of
footholds and artificially-designed challenging terrain are carried out to
verify our methods. All results show that the proposed method dramatically
improves the hexapod robot's ability to pass through sparse footholds
environment
- ā¦