544 research outputs found
Fast Damage Recovery in Robotics with the T-Resilience Algorithm
Damage recovery is critical for autonomous robots that need to operate for a
long time without assistance. Most current methods are complex and costly
because they require anticipating each potential damage in order to have a
contingency plan ready. As an alternative, we introduce the T-resilience
algorithm, a new algorithm that allows robots to quickly and autonomously
discover compensatory behaviors in unanticipated situations. This algorithm
equips the robot with a self-model and discovers new behaviors by learning to
avoid those that perform differently in the self-model and in reality. Our
algorithm thus does not identify the damaged parts but it implicitly searches
for efficient behaviors that do not use them. We evaluate the T-Resilience
algorithm on a hexapod robot that needs to adapt to leg removal, broken legs
and motor failures; we compare it to stochastic local search, policy gradient
and the self-modeling algorithm proposed by Bongard et al. The behavior of the
robot is assessed on-board thanks to a RGB-D sensor and a SLAM algorithm. Using
only 25 tests on the robot and an overall running time of 20 minutes,
T-Resilience consistently leads to substantially better results than the other
approaches
MOTION CONTROL SIMULATION OF A HEXAPOD ROBOT
This thesis addresses hexapod robot motion control. Insect morphology and locomotion patterns inform the design of a robotic model, and motion control is achieved via trajectory planning and bio-inspired principles. Additionally, deep learning and multi-agent reinforcement learning are employed to train the robot motion control strategy with leg coordination achieves using a multi-agent deep reinforcement learning framework. The thesis makes the following contributions:
First, research on legged robots is synthesized, with a focus on hexapod robot motion control. Insect anatomy analysis informs the hexagonal robot body and three-joint single robotic leg design, which is assembled using SolidWorks. Different gaits are studied and compared, and robot leg kinematics are derived and experimentally verified, culminating in a three-legged gait for motion control.
Second, an animal-inspired approach employs a central pattern generator (CPG) control unit based on the Hopf oscillator, facilitating robot motion control in complex environments such as stable walking and climbing. The robot\u27s motion process is quantitatively evaluated in terms of displacement change and body pitch angle.
Third, a value function decomposition algorithm, QPLEX, is applied to hexapod robot motion control. The QPLEX architecture treats each leg as a separate agent with local control modules, that are trained using reinforcement learning. QPLEX outperforms decentralized approaches, achieving coordinated rhythmic gaits and increased robustness on uneven terrain. The significant of terrain curriculum learning is assessed, with QPLEX demonstrating superior stability and faster consequence.
The foot-end trajectory planning method enables robot motion control through inverse kinematic solutions but has limited generalization capabilities for diverse terrains. The animal-inspired CPG-based method offers a versatile control strategy but is constrained to core aspects. In contrast, the multi-agent deep reinforcement learning-based approach affords adaptable motion strategy adjustments, rendering it a superior control policy. These methods can be combined to develop a customized robot motion control policy for specific scenarios
Jointly Learning to Construct and Control Agents using Deep Reinforcement Learning
The physical design of a robot and the policy that controls its motion are
inherently coupled, and should be determined according to the task and
environment. In an increasing number of applications, data-driven and
learning-based approaches, such as deep reinforcement learning, have proven
effective at designing control policies. For most tasks, the only way to
evaluate a physical design with respect to such control policies is
empirical--i.e., by picking a design and training a control policy for it.
Since training these policies is time-consuming, it is computationally
infeasible to train separate policies for all possible designs as a means to
identify the best one. In this work, we address this limitation by introducing
a method that performs simultaneous joint optimization of the physical design
and control network. Our approach maintains a distribution over designs and
uses reinforcement learning to optimize a control policy to maximize expected
reward over the design distribution. We give the controller access to design
parameters to allow it to tailor its policy to each design in the distribution.
Throughout training, we shift the distribution towards higher-performing
designs, eventually converging to a design and control policy that are jointly
optimal. We evaluate our approach in the context of legged locomotion, and
demonstrate that it discovers novel designs and walking gaits, outperforming
baselines in both performance and efficiency
- …