62 research outputs found

    A contact-implicit direct trajectory optimization scheme for the study of legged maneuverability

    Get PDF
    For legged robots to move safely in unpredictable environments, they need to be manoeuvrable, but transient motions such as acceleration, deceleration and turning have been the subject of little research compared to constant-speed gait. They are difficult to study for two reasons: firstly, the way they are executed is highly sensitive to factors such as morphology and traction, and secondly, they can potentially be dangerous, especially when executed rapidly, or from high speeds. These challenges make it an ideal topic for study by simulation, as this allows all variables to be precisely controlled, and puts no human, animal or robotic subjects at risk. Trajectory optimization is a promising method for simulating these manoeuvres, because it allows complete motion trajectories to be generated when neither the input actuation nor the output motion is known. Furthermore, it produces solutions that optimize a given objective, such as minimizing the distance required to stop, or the effort exerted by the actuators throughout the motion. It has consequently become a popular technique for high-level motion planning in robotics, and for studying locomotion in biomechanics. In this dissertation, we present a novel approach to studying motion with trajectory optimization, by viewing it more as “trajectory generation” – a means of generating large quantities of synthetic data that can illuminate the differences between successful and unsuccessful motion strategies when studied in aggregate. One distinctive feature of this approach is the focus on whole-body models, which capture the specific morphology of the subject, rather than the highly-simplified “template” models that are typically used. Another is the use of “contact-implicit” methods, which allow an appropriate footfall sequence to be discovered, rather than requiring that it be defined upfront. Although contact-implicit methods are not novel, they are not widely-used, as they are computationally demanding, and unnecessary when studying comparatively-predictable constant speed locomotion. The second section of this dissertation describes innovations in the formulation of these trajectory optimization problems as nonlinear programming problems (NLPs). This “direct” approach allows these problems to be solved by general-purpose, open-source algorithms, making it accessible to scientists without the specialized applied mathematics knowledge required to solve NLPs. The design of the NLP has a significant impact on the accuracy of the result, the quality of the solution (with respect to the final value of the objective function), and the time required to solve the proble

    Stochastic Model Predictive Control via Fixed Structure Policies

    Get PDF
    In this work, the model predictive control problem is extended to include not only open-loop control sequences but also state-feedback control laws by directly optimizing parameters of a control policy. Additionally, continuous cost functions are developed to allow training of the control policy in making discrete decisions, which is typically done with model-free learning algorithms. This general control policy encompasses a wide class of functions and allows the optimization to occur both online and offline while adding robustness to unmodelled dynamics and outside disturbances. General formulations regarding nonlinear discrete-time dynamics and abstract cost functions are formed for both deterministic and stochastic problems. Analytical solutions are derived for linear cases and compared to existing theory, such as the classical linear quadratic regulator. It is shown that, given some assumptions hold, there exists a finite horizon in which a constant linear state-feedback control law will stabilize a nonlinear system around the origin. Several control policy architectures are used to regulate the cart-pole system in deterministic and stochastic settings, and neural network-based policies are trained to analyze and intercept bodies following stochastic projectile motion

    Online Optimization-based Gait Adaptation of Quadruped Robot Locomotion

    Get PDF
    Quadruped robots demonstrated extensive capabilities of traversing complex and unstructured environments. Optimization-based techniques gave a relevant impulse to the research on legged locomotion. Indeed, by designing the cost function and the constraints, we can guarantee the feasibility of a motion and impose high-level locomotion tasks, e.g., tracking of a reference velocity. This allows one to have a generic planning approach without the need to tailor a specific motion for each terrain, as in the heuristic case. In this context, Model Predictive Control (MPC) can compensate for model inaccuracies and external disturbances, thanks to the high-frequency replanning. The main objective of this dissertation is to develop a Nonlinear MPC (NMPC)-based locomotion framework for quadruped robots. The aim is to obtain an algorithm which can be extended to different robots and gaits; in addition, I sought to remove some assumptions generally done in the literature, e.g., heuristic reference generator and user-defined gait sequence. The starting point of my work is the definition of the Optimal Control Problem to generate feasible trajectories for the Center of Mass. It is descriptive enough to capture the linear and angular dynamics of the robot as a whole. A simplified model (Single Rigid Body Dynamics model) is used for the system dynamics, while a novel cost term maximizes leg mobility to improve robustness in the presence of nonflat terrain. In addition, to test the approach on the real robot, I dedicated particular effort to implementing both a heuristic reference generator and an interface for the controller, and integrating them into the controller framework developed previously by other team members. As a second contribution of my work, I extended the locomotion framework to deal with a trot gait. In particular, I generalized the reference generator to be based on optimization. Exploiting the Linear Inverted Pendulum model, this new module can deal with the underactuation of the trot when only two legs are in contact with the ground, endowing the NMPC with physically informed reference trajectories to be tracked. In addition, the reference velocities are used to correct the heuristic footholds, obtaining contact locations coherent with the motion of the base, even though they are not directly optimized. The model used by the NMPC receives as input the gait sequence, thus with the last part of my work I developed an online multi-contact planner and integrated it into the MPC framework. Using a machine learning approach, the planner computes the best feasible option, even in complex environments, in a few milliseconds, by ranking online a set of discrete options for footholds, i.e., which leg to move and where to step. To train the network, I designed a novel function, evaluated offline, which considers the value of the cost of the NMPC and robustness/stability metrics for each option. These methods have been validated with simulations and experiments over the three years. I tested the NMPC on the Hydraulically actuated Quadruped robot (HyQ) of the IIT’s Dynamic Legged Systems lab, performing omni-directional motions on flat terrain and stepping on a pallet (both static and relocated during the motion) with a crawl gait. The trajectory replanning is performed at high-frequency, and visual information of the terrain is included to traverse uneven terrain. A Unitree Aliengo quadruped robot is used to execute experiments with the trot gait. The optimization-based reference generator allows the robot to reach a fixed goal and recover from external pushes without modifying the structure of the NMPC. Finally, simulations with the Solo robot are performed to validate the neural network-based contact planning. The robot successfully traverses complex scenarios, e.g., stepping stones, with both walk and trot gaits, choosing the footholds online. The achieved results improved the robustness and the performance of the quadruped locomotion. High-frequency replanning, dealing with a fixed goal, recovering after a push, and the automatic selection of footholds could help the robots to accomplish important tasks for the humans, for example, providing support in a disaster response scenario or inspecting an unknown environment. In the future, the contact planning will be transferred to the real hardware. Possible developments foresee the optimization of the gait timings, i.e., stance and swing duration, and a framework which allows the automatic transition between gaits

    Feynman-Kac Numerical Techniques for Stochastic Optimal Control

    Get PDF
    Three significant advancements are proposed for improving numerical methods in the solution of forward-backward stochastic differential equations (FBSDEs) appearing in the Feynman-Kac representation of the value function in stochastic optimal control (SOC) problems. First, we propose a novel characterization of FBSDE estimators as either on-policy or off-policy, highlighting the intuition for these techniques that the distribution over which value functions are approximated should, to some extent, match the distribution the policies generate. Second, two novel numerical estimators are proposed for improving the accuracy of single-timestep updates. In the case of LQR problems, we demonstrate both in theory and in numerical simulation that our estimators result in near machine-precision level accuracy, in contrast to previously proposed methods that can potentially diverge on the same problems. Third, we propose a new method for accelerating the global convergence of FBSDE methods. By the repeated use of the Girsanov change of probability measures, it is demonstrated how a McKean-Markov branched sampling method can be utilized for the forward integration pass, as long as the controlled drift terms are appropriately compensated in the backward integration pass. Subsequently, a numerical approximation of the value function is proposed by solving a series of function approximation problems backwards in time along the edges of a space-filling tree.Ph.D

    Advanced Strategies for Robot Manipulators

    Get PDF
    Amongst the robotic systems, robot manipulators have proven themselves to be of increasing importance and are widely adopted to substitute for human in repetitive and/or hazardous tasks. Modern manipulators are designed complicatedly and need to do more precise, crucial and critical tasks. So, the simple traditional control methods cannot be efficient, and advanced control strategies with considering special constraints are needed to establish. In spite of the fact that groundbreaking researches have been carried out in this realm until now, there are still many novel aspects which have to be explored

    Adaptive control of compliant robots with Reservoir Computing

    Get PDF
    In modern society, robots are increasingly used to handle dangerous, repetitive and/or heavy tasks with high precision. Because of the nature of the tasks, either being dangerous, high precision or simply repetitive, robots are usually constructed with high torque motors and sturdy materials, that makes them dangerous for humans to handle. In a car-manufacturing company, for example, a large cage is placed around the robot’s workspace that prevents humans from entering its vicinity. In the last few decades, efforts have been made to improve human-robot interaction. Often the movement of robots is characterized as not being smooth and clearly dividable into sub-movements. This makes their movement rather unpredictable for humans. So, there exists an opportunity to improve the motion generation of robots to enhance human-robot interaction. One interesting research direction is that of imitation learning. Here, human motions are recorded and demonstrated to the robot. Although the robot is able to reproduce such movements, it cannot be generalized to other situations. Therefore, a dynamical system approach is proposed where the recorded motions are embedded into the dynamics of the system. Shaping these nonlinear dynamics, according to recorded motions, allows for dynamical system to generalize beyond demonstration. As a result, the robot can generate motions of other situations not included in the recorded human demonstrations. In this dissertation, a Reservoir Computing approach is used to create a dynamical system in which such demonstrations are embedded. Reservoir Computing systems are Recurrent Neural Network-based approaches that are efficiently trained by considering only the training of the readout connections and retaining all other connections of such a network unchanged given their initial randomly chosen values. Although they have been used to embed periodic motions before, they were extended to embed discrete motions, or both. This work describes how such a motion pattern-generating system is built, investigates the nature of the underlying dynamics and evaluates their robustness in the face of perturbations. Additionally, a dynamical system approach to obstacle avoidance is proposed that is based on vector fields in the presence of repellers. This technique can be used to extend the motion abilities of the robot without need for changing the trained Motion Pattern Generator (MPG). Therefore, this approach can be applied in real-time on any system that generates a certain movement trajectory. Assume that the MPG system is implemented on an industrial robotic arm, similar to the ones used in a car factory. Even though the obstacle avoidance strategy presented is able to modify the generated motion of the robot’s gripper in such a way that it avoids obstacles, it does not guarantee that other parts of the robot cannot collide with a human. To prevent this, engineers have started to use advanced control algorithms that measure the amount of torque that is applied on the robot. This allows the robot to be aware of external perturbations. However, it turns out that, even with fast control loops, the adaptation to compensate for a sudden perturbation, is too slow to prevent high interaction forces. To reduce such forces, researchers started to use mechanical elements that are passively compliant (e.g., springs) and light-weight flexible materials to construct robots. Although such compliant robots are much safer and inherently energy efficient to use, their control becomes much harder. Most control approaches use model information about the robot (e.g., weight distribution and shape). However, when constructing a compliant robot it is hard to determine the dynamics of these materials. Therefore, a model-free adaptive control framework is proposed that assumes no prior knowledge about the robot. By interacting with the robot it learns an inverse robot model that is used as controller. The more it interacts, the better the control be- comes. Appropriately, this framework is called Inverse Modeling Adaptive (IMA) control framework. I have evaluated the IMA controller’s tracking ability on sev- eral tasks, investigating its model independence and stability. Furthermore, I have shown its fast learning ability and comparable performance to taskspecific designed controllers. Given both the MPG and IMA controllers, it is possible to improve the inter- actability of a compliant robot in a human-friendly environment. When the robot is to perform human-like motions for a large set of tasks, we need to demonstrate motion examples of all these tasks. However, biological research concerning the motion generation of animals and humans revealed that a limited set of motion patterns, called motion primitives, are modulated and combined to generate advanced motor/motion skills that humans and animals exhibit. Inspired by these interesting findings, I investigate if a single motion primitive indeed can be modulated to achieve a desired motion behavior. By some elementary experiments, where an MPG is controlled by an IMA controller, a proof of concept is presented. Furthermore, a general hierarchy is introduced that describes how a robot can be controlled in a biology-inspired manner. I also investigated how motion primitives can be combined to produce a desired motion. However, I was unable to get more advanced implementations to work. The results of some simple experiments are presented in the appendix. Another approach I investigated assumes that the primitives themselves are undefined. Instead, only a high-level description is given, which describes that every primitive on average should contribute equally, while still allowing for a single primitive to specialize in a part of the motion generation. Without defining the behavior of a primitive, only a set of untrained IMA controllers is used of which each will represent a single primitive. As a result of the high-level heuristic description, the task space is tiled into sub-regions in an unsupervised manner. Resulting in controllers that indeed represent a part of the motion generation. I have applied this Modular Architecture with Control Primitives (MACOP) on an inverse kinematic learning task and investigated the emerged primitives. Thanks to the tiling of the task space, it becomes possible to control redundant systems, because redundant solutions can be spread over several control primitives. Within each sub region of the task space, a specific control primitive is more accurate than in other regions allowing for the task complexity to be distributed over several less complex tasks. Finally, I extend the use of an IMA-controller, which is tracking controller, to the control of under-actuated systems. By using a sample-based planning algorithm it becomes possible to explore the system dynamics in which a path to a desired state can be planned. Afterwards, MACOP is used to incorporate feedback and to learn the necessary control commands corresponding to the planned state space trajectory, even if it contains errors. As a result, the under-actuated control of a cart pole system was achieved. Furthermore, I presented the concept of a simulation based control framework that allows the learning of the system dynamics, planning and feedback control iteratively and simultaneously

    Resilience-enhanced control reconfiguration for autonomous systems

    Get PDF
    Unmanned systems keep replacing manned systems as a paradigm shift. According to the Unmanned Autonomous Systems (UAS) market forecast reports, the UAS market value is expected to grow two to three times higher in ten years. Considering the economic impacts of UAS application in job markets and component manufacturing industries, the UAS market value may very well exceed, which is predicted in the reports. However, regulations have limited the effective utilization of UAS due to safety concerns. These restrictive regulations significantly delay the potential usefulness of civilian and commercial UAVs. According to the Unmanned Aerial Vehicle (UAV) incidence reports, mechanical failures come out to be one of the top reasons for the incidents except for human errors. Technically, it is impossible to avoid any fault or failure in any systems. However, it can be possible to save the faulty system if the faults are treated properly. In this regard, this research has reviewed the state-of-the-art techniques regarding system safety improvement in the presence of a critical fault mode. Promising concepts are resilience engineering and Active Fault Tolerant Control (AFTCS) systems. Resilience engineering has been more focus on system design and resilience assessment methods. AFTCS mainly contributes to the fast and stable operating point recovery without the consideration of long-term system performances or mission success. Prognostics-enhanced reconfigurable control frameworks have proposed the online prognosis for a Remaining Useful Life (RUL) prediction within the control scheme but do not address comprehensive mission capability trade-offs. The objective of this study is to design a resilience-enhanced reconfigurable control framework for unmanned autonomous systems in the presence of a critical fault mode during the operation. The proposed resilience-enhanced reconfigurable control framework is composed of three fundamental modules: 1) immediate performance recovery by Model Predictive Control (MPC) and Differential Dynamic Programming (DDP) approaches, 2) long-term mission capability trade-offs by an optimization routine, and 3) situational awareness by a particle filtering-based fault diagnosis and Case-Based Reasoning (CBR). Critical development of this thesis is an introduction of an adaptation parameter in an MPC formulation (Module 1) and optimization process to find an optimal value for the adaptation parameter (Module 2). Module 3 enables long-term mission capability reasoning when a new fault growth pattern is observed. In order to test the efficacy of the proposed framework, under-actuated hovercraft as a testbed and an insulation degradation of an electrical thrust motor as a critical fault mode are introduced. The experiments explore the effect of the adaptation parameter on long-term mission capabilities and identify the necessity of the proper trade-offs. Further experiments investigate the efficacy of each module and the integrated framework. The experiment results show that the adaptation parameter adjusts a control strategy, so that mission capabilities are optimized while vulnerable long-term mission capabilities are recovered. The integrated framework presents the improvement to the probability of mission success in the presence of a critical fault mode. Lastly, as a generalization of the design process for the resilience-enhanced reconfigurable control framework, a design methodology suggests a step-by-step design procedure. Assumptions of the research have guided the required steps and limitations of the proposed framework.Ph.D

    A Survey on Physics Informed Reinforcement Learning: Review and Open Problems

    Full text link
    The inclusion of physical information in machine learning frameworks has revolutionized many application areas. This involves enhancing the learning process by incorporating physical constraints and adhering to physical laws. In this work we explore their utility for reinforcement learning applications. We present a thorough review of the literature on incorporating physics information, as known as physics priors, in reinforcement learning approaches, commonly referred to as physics-informed reinforcement learning (PIRL). We introduce a novel taxonomy with the reinforcement learning pipeline as the backbone to classify existing works, compare and contrast them, and derive crucial insights. Existing works are analyzed with regard to the representation/ form of the governing physics modeled for integration, their specific contribution to the typical reinforcement learning architecture, and their connection to the underlying reinforcement learning pipeline stages. We also identify core learning architectures and physics incorporation biases (i.e., observational, inductive and learning) of existing PIRL approaches and use them to further categorize the works for better understanding and adaptation. By providing a comprehensive perspective on the implementation of the physics-informed capability, the taxonomy presents a cohesive approach to PIRL. It identifies the areas where this approach has been applied, as well as the gaps and opportunities that exist. Additionally, the taxonomy sheds light on unresolved issues and challenges, which can guide future research. This nascent field holds great potential for enhancing reinforcement learning algorithms by increasing their physical plausibility, precision, data efficiency, and applicability in real-world scenarios
    • …
    corecore