23 research outputs found

    Echo state model of non-Markovian reinforcement learning, An

    Get PDF
    Department Head: Dale H. Grit.2008 Spring.Includes bibliographical references (pages 137-142).There exists a growing need for intelligent, autonomous control strategies that operate in real-world domains. Theoretically the state-action space must exhibit the Markov property in order for reinforcement learning to be applicable. Empirical evidence, however, suggests that reinforcement learning also applies to domains where the state-action space is approximately Markovian, a requirement for the overwhelming majority of real-world domains. These domains, termed non-Markovian reinforcement learning domains, raise a unique set of practical challenges. The reconstruction dimension required to approximate a Markovian state-space is unknown a priori and can potentially be large. Further, spatial complexity of local function approximation of the reinforcement learning domain grows exponentially with the reconstruction dimension. Parameterized dynamic systems alleviate both embedding length and state-space dimensionality concerns by reconstructing an approximate Markovian state-space via a compact, recurrent representation. Yet this representation extracts a cost; modeling reinforcement learning domains via adaptive, parameterized dynamic systems is characterized by instability, slow-convergence, and high computational or spatial training complexity. The objectives of this research are to demonstrate a stable, convergent, accurate, and scalable model of non-Markovian reinforcement learning domains. These objectives are fulfilled via fixed point analysis of the dynamics underlying the reinforcement learning domain and the Echo State Network, a class of parameterized dynamic system. Understanding models of non-Markovian reinforcement learning domains requires understanding the interactions between learning domains and their models. Fixed point analysis of the Mountain Car Problem reinforcement learning domain, for both local and nonlocal function approximations, suggests a close relationship between the locality of the approximation and the number and severity of bifurcations of the fixed point structure. This research suggests the likely cause of this relationship: reinforcement learning domains exist within a dynamic feature space in which trajectories are analogous to states. The fixed point structure maps dynamic space onto state-space. This explanation suggests two testable hypotheses. Reinforcement learning is sensitive to state-space locality because states cluster as trajectories in time rather than space. Second, models using trajectory-based features should exhibit good modeling performance and few changes in fixed point structure. Analysis of performance of lookup table, feedforward neural network, and Echo State Network (ESN) on the Mountain Car Problem reinforcement learning domain confirm these hypotheses. The ESN is a large, sparse, randomly-generated, unadapted recurrent neural network, which adapts a linear projection of the target domain onto the hidden layer. ESN modeling results on reinforcement learning domains show it achieves performance comparable to lookup table and neural network architectures on the Mountain Car Problem with minimal changes to fixed point structure. Also, the ESN achieves lookup table caliber performance when modeling Acrobot, a four-dimensional control problem, but is less successful modeling the lower dimensional Modified Mountain Car Problem. These performance discrepancies are attributed to the ESN’s excellent ability to represent complex short term dynamics, and its inability to consolidate long temporal dependencies into a static memory. Without memory consolidation, reinforcement learning domains exhibiting attractors with multiple dynamic scales are unlikely to be well-modeled via ESN. To mediate this problem, a simple ESN memory consolidation method is presented and tested for stationary dynamic systems. These results indicate the potential to improve modeling performance in reinforcement learning domains via memory consolidation

    Intelligent model-based control of complex multi-link mechanisms

    Get PDF
    Complex under-actuated multilink mechanism involves a system whose number of control inputs is smaller than the dimension of the configuration space. The ability to control such a system through the manipulation of its natural dynamics would allow for the design of more energy-efficient machines with the ability to achieve smooth motions similar to those found in the natural world. This research aims to understand the complex nature of the Robogymnast, a triple link underactuated pendulum built at Cardiff University with the purpose of studying the behaviour of non-linear systems and understanding the challenges in developing its control system. A mathematical model of the robot was derived from the Euler-Lagrange equations. The design of the control system was based on the discrete-time linear model around the downward position and a sampling time of 2.5 milliseconds. Firstly, Invasive Weed Optimization (IWO) was used to optimize the swing-up motion of the robot by determining the optimum values of parameters that control the input signals of the Robogymnast’s two motors. The values obtained from IWO were then applied to both simulation and experiment. The results showed that the swing-up motion of the Robogymnast from the stable downward position to the inverted configuration to be successfully achieved. Secondly, due to the complex nature and nonlinearity of the Robogymnast, a novel approach of modelling the Robogymnast using a multi-layered Elman neural ii network (ENN) was proposed. The ENN model was then tested with various inputs and its output were analysed. The results showed that the ENN model to be capable of providing a better representation of the actual system compared to the mathematical model. Thirdly, IWO is used to investigate the optimum Q values of the Linear Quadratic Regulator (LQR) for inverted balance control of the Robogymnast. IWO was used to obtain the optimal Q values required by the LQR to maintain the Robogymnast in an upright configuration. Two fitness criteria were investigated: cost function J and settling time T. A controller was developed using values obtained from each fitness criteria. The results showed that LQRT performed faster but LQRJ was capable of stabilizing the Robogymnast from larger deflection angles. Finally, fitness criteria J and T were used simultaneously to obtain the optimal Q values for the LQR. For this purpose, two multi-objective optimization methods based on the IWO, namely the Weighted Criteria Method IWO (WCMIWO) and the Fuzzy Logic IWO Hybrid (FLIWOH) were developed. Two LQR controllers were first developed using the parameters obtained from the two optimization methods. The same process was then repeated with disturbance applied to the Robogymnast states to develop another two LQR controllers. The response of the controllers was then tested in different scenarios using simulation and their performance was evaluated. The results showed that all four controllers were able to balance the Robogymnast with the fastest settling time achieved by WMCIWO with disturbance followed by in the ascending order: FLIWOH with disturbance, FLIWOH, and WCMIWO

    Efficient Learning with Subgoals and Gaussian Process

    Full text link
    This thesis demonstrates how data efficiency in reinforcement learning can be improved through the use of subgoals and Gaussian process. Data efficiency is extremely important in a range of problems in which gathering additional data is expensive. This tends to be the case in most problems that involve actual interactions with the physical world, such as a robot kicking a ball, an autonomous vehicle driving or a drone manoeuvring. State of the art data efficiency is achieved on several well researched problems. The systems that achieve this learn Gaussian process state transition models of the problem. The model based learner system uses the state transition model to learn the action to take in each state. The subgoal planner makes use of the state transition model to build an explicit plan to solve the problem. The subgoal planner is improved through the use of learned subgoals to aid navigation of the problem space. The resource managed learner balances the costs of computation against the value of selecting better experiments in order to improve data efficiency. An active learning system is used to estimate the value of the experiments in terms of how much they may improve the current solution. This is compared to an estimate of how much better an experiment found by expending additional computation will be along with the costs of performing that computation. A theoretical framework around the use of subgoals in problem solving is presented. This framework provides insights into when and why subgoals are effective, along with avenues for future research. This includes a detailed proposal for a system built off the subgoal theory framework intended to make full use of subgoals to create an effective reinforcement learning system

    Modeling, analysis and control of robot-object nonsmooth underactuated Lagrangian systems: A tutorial overview and perspectives

    Get PDF
    International audienceSo-called robot-object Lagrangian systems consist of a class of nonsmooth underactuated complementarity Lagrangian systems, with a specific structure: an "object" and a "robot". Only the robot is actuated. The object dynamics can thus be controlled only through the action of the contact Lagrange multipliers, which represent the interaction forces between the robot and the object. Juggling, walking, running, hopping machines, robotic systems that manipulate objects, tapping, pushing systems, kinematic chains with joint clearance, crawling, climbing robots, some cable-driven manipulators, and some circuits with set-valued nonsmooth components, belong this class. This article aims at presenting their main features, then many application examples which belong to the robot-object class, then reviewing the main tools and control strategies which have been proposed in the Automatic Control and in the Robotics literature. Some comments and open issues conclude the article

    Model learning for trajectory tracking of robot manipulators

    Get PDF
    Abstract Model based controllers have drastically improved robot performance, increasing task accuracy while reducing control effort. Nevertheless, all this was realized with a very strong assumption: the exact knowledge of the physical properties of both the robot and the environment that surrounds it. This assertion is often misleading: in fact modern robots are modeled in a very approximate way and, more important, the environment is almost never static and completely known. Also for systems very simple, such as robot manipulators, these assumptions are still too strong and must be relaxed. Many methods were developed which, exploiting previous experiences, are able to refine the nominal model: from classic identification techniques to more modern machine learning based approaches. Indeed, the topic of this thesis is the investigation of these data driven techniques in the context of robot control for trajectory tracking. In the first two chapters, preliminary knowledge is provided on both model based controllers, used in robotics to assure precise trajectory tracking, and model learning techniques. In the following three chapters, are presented the novelties introduced by the author in this context with respect to the state of the art: three works with the same premise (an inaccurate system modeling), an identical goal (accurate trajectory tracking control) but with small differences according to the specific platform of application (fully actuated, underactuated, redundant robots). In all the considered architectures, an online learning scheme has been introduced to correct the nominal feedback linearization control law. Indeed, the method has been primarily introduced in the literature to cope with fully actuated systems, showing its efficacy in the accurate tracking of joint space trajectories also with an inaccurate dynamic model. The main novelty of the technique was the use of only kinematics information, instead of torque measurements (in general very noisy), to online retrieve and compensate the dynamic mismatches. After that the method has been extended to underactuated robots. This new architecture was composed by an online learning correction of the controller, acting on the actuated part of the system (the nominal partial feedback linearization), and an offline planning phase, required to realize a dynamically feasible trajectory also for the zero dynamics of the system. The scheme was iterative: after each trial, according to the collected information, both the phases were improved and then repeated until the task achievement. Also in this case the method showed its capability, both in numerical simulations and on real experiments on a robotics platform. Eventually the method has been applied to redundant systems: differently from before, in this context the task consisted in the accurate tracking of a Cartesian end effector trajectory. In principle very similar to the fully actuated case, the presence of redundancy slowed down drastically the learning machinery convergence, worsening the performance. In order to cope with this, a redundancy resolution was proposed that, exploiting an approximation of the learning algorithm (Gaussian process regression), allowed to locally maximize the information and so select the most convenient self motion for the system; moreover, all of this was realized with just the resolution of a quadratic programming problem. Also in this case the method showed its performance, realizing an accurate online tracking while reducing both the control effort and the joints velocity, obtaining so a natural behaviour. The thesis concludes with summary considerations on the proposed approach and with possible future directions of research

    Advances in Reinforcement Learning

    Get PDF
    Reinforcement Learning (RL) is a very dynamic area in terms of theory and application. This book brings together many different aspects of the current research on several fields associated to RL which has been growing rapidly, producing a wide variety of learning algorithms for different applications. Based on 24 Chapters, it covers a very broad variety of topics in RL and their application in autonomous systems. A set of chapters in this book provide a general overview of RL while other chapters focus mostly on the applications of RL paradigms: Game Theory, Multi-Agent Theory, Robotic, Networking Technologies, Vehicular Navigation, Medicine and Industrial Logistic

    Design of high-performance legged robots: A case study on a hopping and balancing robot

    Get PDF
    The availability and capabilities of present-day technology suggest that legged robots should be able to physically outperform their biological counterparts. This thesis revolves around the philosophy that the observed opposite is caused by over-complexity in legged robot design, which is believed to substantially suppress design for high-performance. In this dissertation a design philosophy is elaborated with a focus on simple but high performance design. This philosophy is governed by various key points, including holistic design, technology-inspired design, machine and behaviour co-design and design at the performance envelope. This design philosophy also focuses on improving progress in robot design, which is inevitably complicated by the aspire for high performance. It includes an approach of iterative design by trial-and-error, which is believed to accelerate robot design through experience. This thesis mainly focuses on the case study of Skippy, a fully autonomous monopedal balancing and hopping robot. Skippy is maximally simple in having only two actuators, which is the minimum number of actuators required to control a robot in 3D. Despite its simplicity, it is challenged with a versatile set of high-performance activities, ranging from balancing to reaching record jump heights, to surviving crashes from several meters and getting up unaided after a crash, while being built from off-the-shelf technology. This thesis has contributed to the detailed mechanical design of Skippy and its optimisations that abide the design philosophy, and has resulted in a robust and realistic design that is able to reach a record jump height of 3.8m. Skippy is also an example of iterative design through trial-and-error, which has lead to the successful design and creation of the balancing-only precursor Tippy. High-performance balancing has been successfully demonstrated on Tippy, using a recently developed balancing algorithm that combines the objective of tracking a desired position command with balancing, as required for preparing hopping motions. This thesis has furthermore contributed to several ideas and theories on Skippy's road of completion, which are also useful for designing other high-performance robots. These contributions include (1) the introduction of an actuator design criterion to maximize the physical balance recovery of a simple balancing machine, (2) a generalization of the centre of percussion for placement of components that are sensitive to shock and (3) algebraic modelling of a non-linear high-gravimetric energy density compression spring with a regressive stress-strain profile. The activities performed and the results achieved have been proven to be valuable, however they have also delayed the actual creation of Skippy itself. A possible explanation for this happening is that Skippy's requirements and objectives were too ambitious, for which many complications were encountered in the decision-making progress of the iterative design strategy, involving trade-offs between exercising trial-and-error, elaborate simulation studies and the development of above-mentioned new theories. Nevertheless, from (1) the resulting realistic design of Skippy, (2) the successful creation and demonstrations of Tippy and (3) the contributed theories for high-performance robot design, it can be concluded that the adopted design philosophy has been generally successful. Through the case study design project of the hopping and balancing robot Skippy, it is shown that proper design for high physical performance (1) can indeed lead to a robot design that is capable of physically outperforming humans and animals and (2) is already very challenging for a robot that is intended to be very simple

    Bio-inspired robotic control in underactuation: principles for energy efficacy, dynamic compliance interactions and adaptability.

    Get PDF
    Biological systems achieve energy efficient and adaptive behaviours through extensive autologous and exogenous compliant interactions. Active dynamic compliances are created and enhanced from musculoskeletal system (joint-space) to external environment (task-space) amongst the underactuated motions. Underactuated systems with viscoelastic property are similar to these biological systems, in that their self-organisation and overall tasks must be achieved by coordinating the subsystems and dynamically interacting with the environment. One important question to raise is: How can we design control systems to achieve efficient locomotion, while adapt to dynamic conditions as the living systems do? In this thesis, a trajectory planning algorithm is developed for underactuated microrobotic systems with bio-inspired self-propulsion and viscoelastic property to achieve synchronized motion in an energy efficient, adaptive and analysable manner. The geometry of the state space of the systems is explicitly utilized, such that a synchronization of the generalized coordinates is achieved in terms of geometric relations along the desired motion trajectory. As a result, the internal dynamics complexity is sufficiently reduced, the dynamic couplings are explicitly characterised, and then the underactuated dynamics are projected onto a hyper-manifold. Following such a reduction and characterization, we arrive at mappings of system compliance and integrable second-order dynamics with the passive degrees of freedom. As such, the issue of trajectory planning is converted into convenient nonlinear geometric analysis and optimal trajectory parameterization. Solutions of the reduced dynamics and the geometric relations can be obtained through an optimal motion trajectory generator. Theoretical background of the proposed approach is presented with rigorous analysis and developed in detail for a particular example. Experimental studies are conducted to verify the effectiveness of the proposed method. Towards compliance interactions with the environment, accurate modelling or prediction of nonlinear friction forces is a nontrivial whilst challenging task. Frictional instabilities are typically required to be eliminated or compensated through efficiently designed controllers. In this work, a prediction and analysis framework is designed for the self-propelled vibro-driven system, whose locomotion greatly relies on the dynamic interactions with the nonlinear frictions. This thesis proposes a combined physics-based and analytical-based approach, in a manner that non-reversible characteristic for static friction, presliding as well as pure sliding regimes are revealed, and the frictional limit boundaries are identified. Nonlinear dynamic analysis and simulation results demonstrate good captions of experimentally observed frictional characteristics, quenching of friction-induced vibrations and satisfaction of energy requirements. The thesis also performs elaborative studies on trajectory tracking. Control schemes are designed and extended for a class of underactuated systems with concrete considerations on uncertainties and disturbances. They include a collocated partial feedback control scheme, and an adaptive variable structure control scheme with an elaborately designed auxiliary control variable. Generically, adaptive control schemes using neural networks are designed to ensure trajectory tracking. Theoretical background of these methods is presented with rigorous analysis and developed in detail for particular examples. The schemes promote the utilization of linear filters in the control input to improve the system robustness. Asymptotic stability and convergence of time-varying reference trajectories for the system dynamics are shown by means of Lyapunov synthesis
    corecore