60 research outputs found

    Biped dynamic walking using reinforcement learning

    Get PDF
    This thesis presents a study of biped dynamic walking using reinforcement learning. A hardware biped robot was built. It uses low gear ratio DC motors in order to provide free leg movements. The Self Scaling Reinforcement learning algorithm was developed in order to deal with the problem of reinforcement learning in continuous action domains. A new learning architecture was designed to solve complex control problems. It uses different modules that consist of simple controllers and small neural networks. The architecture allows for easy incorporation of modules that represent new knowledge, or new requirements for the desired task. Control experiments were carried out using a simulator and the physical biped. The biped learned dynamic walking on flat surfaces without any previous knowledge about its dynamic model

    Intelligent approaches in locomotion - a review

    Get PDF

    Adaptive PI Hermite neural control for MIMO uncertain nonlinear systems

    Get PDF
    [[abstract]]This paper presents an adaptive PI Hermite neural control (APIHNC) system for multi-input multi-output (MIMO) uncertain nonlinear systems. The proposed APIHNC system is composed of a neural controller and a robust compensator. The neural controller uses a three-layer Hermite neural network (HNN) to online mimic an ideal controller and the robust compensator is designed to eliminate the effect of the approximation error introduced by the neural controller upon the system stability in the Lyapunov sense. Moreover, a proportional–integral learning algorithm is derived to speed up the convergence of the tracking error. Finally, the proposed APIHNC system is applied to an inverted double pendulums and a two-link robotic manipulator. Simulation results verify that the proposed APIHNC system can achieve high-precision tracking performance. It should be emphasized that the proposed APIHNC system is clearly and easily used for real-time applications.[[notice]]補正完畢[[incitationindex]]SCI[[booktype]]紙本[[booktype]]電子

    Learning control of bipedal dynamic walking robots with neural networks

    Get PDF
    Thesis (Elec.E.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 90-94).Stability and robustness are two important performance requirements for a dynamic walking robot. Learning and adaptation can improve stability and robustness. This thesis explores such an adaptation capability through the use of neural networks. Three neural network models (BP, CMAC and RBF networks) are studied. The RBF network is chosen as best, despite its weakness at covering high dimensional input spaces. To overcome this problem, a self-organizing scheme of data clustering is explored. This system is applied successfully in a biped walking robot system with a supervised learning mode. Generalized Virtual Model Control (GVMC) is also proposed in this thesis, which is inspired by a bio-mechanical model of locomotion, and is an extension of ordinary Virtual Model Control. Instead of adding virtual impedance components to the biped skeletal system in virtual Cartesian space, GVMC uses adaptation to approximately reconstruct the dynamics of the biped. The effectiveness of these approaches is proved both theoretically and experimentally (in simulation).by Jianjuen Hu.Elec.E

    Goal-Based Control and Planning in Biped Locomotion Using Computational Intelligence Methods

    Get PDF
    Este trabajo explora la aplicación de campos neuronales, a tareas de control dinámico en el domino de caminata bípeda. En una primera aproximación, se propone una arquitectura de control que usa campos neuronales en 1D. Esta arquitectura de control es evaluada en el problema de estabilidad para el péndulo invertido de carro y barra, usado como modelo simplificado de caminata bípeda. El controlador por campos neuronales, parametrizado tanto manualmente como usando un algoritmo evolutivo (EA), se compara con una arquitectura de control basada en redes neuronales recurrentes (RNN), también parametrizada por por un EA. El controlador por campos neuronales parametrizado por EA se desempeña mejor que el parametrizado manualmente, y es capaz de recuperarse rápidamente de las condiciones iniciales más problemáticas. Luego, se desarrolla una arquitectura extendida de control y planificación usando campos neurales en 2D, y se aplica al problema caminata bípeda simple (SBW). Para ello se usa un conjunto de valores _óptimos para el parámetro de control, encontrado previamente usando algoritmos evolutivos. El controlador óptimo por campos neuronales obtenido se compara con el controlador lineal propuesto por Wisse et al., y a un controlador _optimo tabular que usa los mismos parámetros óptimos. Si bien los controladores propuestos para el problema SBW implementan una estrategia activa de control, se aproximan de manera más cercana a la caminata dinámica pasiva (PDW) que trabajos previos, disminuyendo la acción de control acumulada. / Abstract. This work explores the application of neural fields to dynamical control tasks in the domain of biped walking. In a first approximation, a controller architecture that uses 1D neural fields is proposed. This controller architecture is evaluated using the stability problem for the cart-and-pole inverted pendulum, as a simplified biped walking model. The neural field controller is compared, parameterized both manually and using an evolutionary algorithm (EA), to a controller architecture based on a recurrent neural neuron (RNN), also parametrized by an EA. The non-evolved neural field controller performs better than the RNN controller. Also, the evolved neural field controller performs better than the non-evolved one and is able to recover fast from worst-case initial conditions. Then, an extended control and planning architecture using 2D neural fields is developed and applied to the SBW problem. A set of optimal parameter values, previously found using an EA, is used as parameters for neural field controller. The optimal neural field controller is compared to the linear controller proposed by Wisse et al., and to a table-lookup controller using the same optimal parameters. While being an active control strategy, the controllers proposed here for the SBW problem approach more closely Passive Dynamic Walking (PDW) than previous works, by diminishing the cumulative control action.Maestrí

    A sensory-based adaptive walking control algorithm for variable speed biped robot gaits

    Get PDF
    A balance scheme for handling variable speed gaits was implemented on an experimental biped. The control scheme used pre-planned but adaptive motion sequences in combination with closed loop reactive control. CMAC neural networks were responsible for the adaptive control of side-to-side and front-to-back balance. The biped performance improved with neural network training. The biped was able to walk with variable speed gaits, and to change gait speeds on the fly. The slower gait speeds required statically balanced walking, while the faster speeds required dynamically balanced walking. It was not necessary to distinguish between the two balance modes within the controller. Following training, the biped was able to walk with continuous motion on flat, non-slippery surfaces at forward progression velocities in the range of 21 cm/min to 72 cm/min, with average stride lengths of 6.5 cm

    Reinforcement Learning Algorithms in Humanoid Robotics

    Get PDF

    BIPED LOCOMOTION: STABILITY, ANALYSIS AND CONTROL

    Get PDF

    Value Function Estimation in Optimal Control via Takagi-Sugeno Models and Linear Programming

    Full text link
    [ES] La presente Tesis emplea técnicas de programación dinámica y aprendizaje por refuerzo para el control de sistemas no lineales en espacios discretos y continuos. Inicialmente se realiza una revisión de los conceptos básicos de programación dinámica y aprendizaje por refuerzo para sistemas con un número finito de estados. Se analiza la extensión de estas técnicas mediante el uso de funciones de aproximación que permiten ampliar su aplicabilidad a sistemas con un gran número de estados o sistemas continuos. Las contribuciones de la Tesis son: -Se presenta una metodología que combina identificación y ajuste de la función Q, que incluye la identificación de un modelo Takagi-Sugeno, el cálculo de controladores subóptimos a partir de desigualdades matriciales lineales y el consiguiente ajuste basado en datos de la función Q a través de una optimización monotónica. -Se propone una metodología para el aprendizaje de controladores utilizando programación dinámica aproximada a través de programación lineal. La metodología hace que ADP-LP funcione en aplicaciones prácticas de control con estados y acciones continuos. La metodología propuesta estima una cota inferior y superior de la función de valor óptima a través de aproximadores funcionales. Se establecen pautas para los datos y la regularización de regresores con el fin de obtener resultados satisfactorios evitando soluciones no acotadas o mal condicionadas. -Se plantea una metodología bajo el enfoque de programación lineal aplicada a programación dinámica aproximada para obtener una mejor aproximación de la función de valor óptima en una determinada región del espacio de estados. La metodología propone aprender gradualmente una política utilizando datos disponibles sólo en la región de exploración. La exploración incrementa progresivamente la región de aprendizaje hasta obtener una política convergida.[CA] La present Tesi empra tècniques de programació dinàmica i aprenentatge per reforç per al control de sistemes no lineals en espais discrets i continus. Inicialment es realitza una revisió dels conceptes bàsics de programació dinàmica i aprenentatge per reforç per a sistemes amb un nombre finit d'estats. S'analitza l'extensió d'aquestes tècniques mitjançant l'ús de funcions d'aproximació que permeten ampliar la seua aplicabilitat a sistemes amb un gran nombre d'estats o sistemes continus. Les contribucions de la Tesi són: -Es presenta una metodologia que combina identificació i ajust de la funció Q, que inclou la identificació d'un model Takagi-Sugeno, el càlcul de controladors subòptims a partir de desigualtats matricials lineals i el consegüent ajust basat en dades de la funció Q a través d'una optimització monotónica. -Es proposa una metodologia per a l'aprenentatge de controladors utilitzant programació dinàmica aproximada a través de programació lineal. La metodologia fa que ADP-LP funcione en aplicacions pràctiques de control amb estats i accions continus. La metodologia proposada estima una cota inferior i superior de la funció de valor òptima a través de aproximadores funcionals. S'estableixen pautes per a les dades i la regularització de regresores amb la finalitat d'obtenir resultats satisfactoris evitant solucions no fitades o mal condicionades. -Es planteja una metodologia sota l'enfocament de programació lineal aplicada a programació dinàmica aproximada per a obtenir una millor aproximació de la funció de valor òptima en una determinada regió de l'espai d'estats. La metodologia proposa aprendre gradualment una política utilitzant dades disponibles només a la regió d'exploració. L'exploració incrementa progressivament la regió d'aprenentatge fins a obtenir una política convergida.[EN] The present Thesis employs dynamic programming and reinforcement learning techniques in order to obtain optimal policies for controlling nonlinear systems with discrete and continuous states and actions. Initially, a review of the basic concepts of dynamic programming and reinforcement learning is carried out for systems with a finite number of states. After that, the extension of these techniques to systems with a large number of states or continuous state systems is analysed using approximation functions. The contributions of the Thesis are: -A combined identification/Q-function fitting methodology, which involves identification of a Takagi-Sugeno model, computation of (sub)optimal controllers from Linear Matrix Inequalities, and the subsequent data-based fitting of Q-function via monotonic optimisation. -A methodology for learning controllers using approximate dynamic programming via linear programming is presented. The methodology makes that ADP-LP approach can work in practical control applications with continuous state and input spaces. The proposed methodology estimates a lower bound and upper bound of the optimal value function through functional approximators. Guidelines are provided for data and regressor regularisation in order to obtain satisfactory results avoiding unbounded or ill-conditioned solutions. -A methodology of approximate dynamic programming via linear programming in order to obtain a better approximation of the optimal value function in a specific region of state space. The methodology proposes to gradually learn a policy using data available only in the exploration region. The exploration progressively increases the learning region until a converged policy is obtained.This work was supported by the National Department of Higher Education, Science, Technology and Innovation of Ecuador (SENESCYT), and the Spanish ministry of Economy and European Union, grant DPI2016-81002-R (AEI/FEDER,UE). The author also received the grant for a predoctoral stay, Programa de Becas Iberoamérica- Santander Investigación 2018, of the Santander Bank.Díaz Iza, HP. (2020). Value Function Estimation in Optimal Control via Takagi-Sugeno Models and Linear Programming [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/139135TESI
    corecore