Search CORE

393 research outputs found

Muscle synergies in neuroscience and robotics: from input-space to task-space perspectives

Author: Bastien eBerret
Cristiano eAlessandro
Francesco eNori
Ioannis eDelis
Ioannis eDelis
Ioannis eDelis
Stefano ePanzeri
Stefano ePanzeri
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2013
Field of study

In this paper we review the works related to muscle synergies that have been carried-out in neuroscience and control engineering. In particular, we refer to the hypothesis that the central nervous system (CNS) generates desired muscle contractions by combining a small number of predefined modules, called muscle synergies. We provide an overview of the methods that have been employed to test the validity of this scheme, and we show how the concept of muscle synergy has been generalized for the control of artificial agents. The comparison between these two lines of research, in particular their different goals and approaches, is instrumental to explain the computational implications of the hypothesized modular organization. Moreover, it clarifies the importance of assessing the functional role of muscle synergies: although these basic modules are defined at the level of muscle activations (input-space), they should result in the effective accomplishment of the desired task. This requirement is not always explicitly considered in experimental neuroscience, as muscle synergies are often estimated solely by analyzing recorded muscle activities. We suggest that synergy extraction methods should explicitly take into account task execution variables, thus moving from a perspective purely based on input-space to one grounded on task-space as well

Crossref

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

Enlighten

Model-based contextual policy search for data-efficient generalization of robot skills

Author: Abbeel
Atkeson
Bagnell
Bagnell
Baxter
Boyd
da Silva
Daniel
Deisenroth
Deisenroth
Deisenroth
Deisenroth
Deisenroth
Englert
Gams
Grollman
Ijspeert
Ko
Kober
Kober
Kober
Kohl
Kormushev
Kupcsik
Lens
Muelling
Neumann
Neumann
Ng
Peters
Peters
Rasmussen
Rückstieß
Schneider
Sehnke
Snelson
Sutton
Theodorou
Titsias
Ude
Wierstra
Williams
Yi
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

In robotics, lower-level controllers are typically used to make the robot solve a specific task in a fixed context. For example, the lower-level controller can encode a hitting movement while the context defines the target coordinates to hit. However, in many learning problems the context may change between task executions. To adapt the policy to a new context, we utilize a hierarchical approach by learning an upper-level policy that generalizes the lower-level controllers to new contexts. A common approach to learn such upper-level policies is to use policy search. However, the majority of current contextual policy search approaches are model-free and require a high number of interactions with the robot and its environment. Model-based approaches are known to significantly reduce the amount of robot experiments, however, current model-based techniques cannot be applied straightforwardly to the problem of learning contextual upper-level policies. They rely on specific parametrizations of the policy and the reward function, which are often unrealistic in the contextual policy search formulation. In this paper, we propose a novel model-based contextual policy search algorithm that is able to generalize lower-level controllers, and is data-efficient. Our approach is based on learned probabilistic forward models and information theoretic policy search. Unlike current algorithms, our method does not require any assumption on the parametrization of the policy or the reward function. We show on complex simulated robotic tasks and in a real robot experiment that the proposed learning framework speeds up the learning process by up to two orders of magnitude in comparison to existing methods, while learning high quality policies

University of Lincoln Institutional Repository

TUbiblio

Crossref

UCL Discovery

Spiral - Imperial College Digital Repository

MPG.PuRe

Model-based contextual policy search for data-efficient generalization of robot skills

Author: Deisenroth MP
Kupcsik A
Neumann G
Peters J
Poh LA
Vadakkepat P
Publication venue: ELSEVIER SCIENCE BV
Publication date: 01/06/2017
Field of study

UCL Discovery

A model of conceptual bootstrapping in human cognition

Author: Bramley Neil R.
Lucas Christopher G.
Zhao Bonan
Publication venue
Publication date: 16/10/2023
Field of study

Edinburgh Research Explorer

A Survey on Policy Search for Robotics

Author: Deisenroth MP
Neumann G
Peters J
Publication venue: 'Now Publishers'
Publication date: 01/01/2011
Field of study

Policy search is a subfield in reinforcement learning which focuses on finding good parameters for a given policy parametrization. It is well suited for robotics as it can cope with high-dimensional state and action spaces, one of the main challenges in robot learning. We review recent successes of both model-free and model-based policy search in robot learning. Model-free policy search is a general approach to learn policies based on sampled trajectories. We classify model-free methods based on their policy evaluation strategy, policy update strategy, and exploration strategy and present a unified view on existing algorithms. Learning a policy is often easier than learning an accurate forward model, and, hence, model-free methods are more frequently used in practice. However, for each sampled trajectory, it is necessary to interact with the * Both authors contributed equally. robot, which can be time consuming and challenging in practice. Modelbased policy search addresses this problem by first learning a simulator of the robot’s dynamics from data. Subsequently, the simulator generates trajectories that are used for policy learning. For both modelfree and model-based policy search methods, we review their respective properties and their applicability to robotic systems

University of Lincoln Institutional Repository

TUbiblio

Crossref

Spiral - Imperial College Digital Repository

MPG.PuRe

Non-parametric Models and Contextual Policy Search for More Efficient Robot Skill Generalization

Author: ANDRAS GABOR KUPCSIK
Publication venue
Publication date: 21/01/2014
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

Combining reinforcement learning and optimal control for the control of nonlinear dynamical systems

Author: Abramova Ekaterina
Publication venue: Computing, Imperial College London
Publication date: 01/03/2016
Field of study

This thesis presents a novel hierarchical learning framework, Reinforcement Learning Optimal Control, for controlling nonlinear dynamical systems with continuous states and actions. The adapted approach mimics the neural computations that allow our brain to bridge across the divide between symbolic action-selection and low-level actuation control by operating at two levels of abstraction. First, current findings demonstrate that at the level of limb coordination human behaviour is explained by linear optimal feedback control theory, where cost functions match energy and timing constraints of tasks. Second, humans learn cognitive tasks involving learning symbolic level action selection, in terms of both model-free and model-based reinforcement learning algorithms. We postulate that the ease with which humans learn complex nonlinear tasks arises from combining these two levels of abstraction. The Reinforcement Learning Optimal Control framework learns the local task dynamics from naive experience using an expectation maximization algorithm for estimation of linear dynamical systems and forms locally optimal Linear Quadratic Regulators, producing continuous low-level control. A high-level reinforcement learning agent uses these available controllers as actions and learns how to combine them in state space, while maximizing a long term reward. The optimal control costs form training signals for high-level symbolic learner. The algorithm demonstrates that a small number of locally optimal linear controllers can be combined in a smart way to solve global nonlinear control problems and forms a proof-of-principle to how the brain may bridge the divide between low-level continuous control and high-level symbolic action selection. It competes in terms of computational cost and solution quality with state-of-the-art control, which is illustrated with solutions to benchmark problems.Open Acces

Spiral - Imperial College Digital Repository

Recommended from our members

Learning Parameterized Skills

Author: Castro da Silva Bruno
Publication venue: ScholarWorks@UMass Amherst
Publication date: 17/03/2015
Field of study

One of the defining characteristics of human intelligence is the ability to acquire and refine skills. Skills are behaviors for solving problems that an agent encounters often—sometimes in different contexts and situations—throughout its lifetime. Identifying important problems that recur and retaining their solutions as skills allows agents to more rapidly solve novel problems by adjusting and combining their existing skills. In this thesis we introduce a general framework for learning reusable parameterized skills. Reusable skills are parameterized procedures that—given a description of a problem to be solved—produce appropriate behaviors or policies. They can be sequentially and hierarchically combined with other skills to produce progressively more abstract and temporally extended behaviors. We identify three major challenges involved in the construction of such skills. First, an agent should be capable of solving a small number of problems and generalizing these experiences to construct a single reusable skill. The skill should be capable of producing appropriate behaviors even when applied to yet unseen variations of a problem. We introduce a method for estimating properties of the lower-dimensional manifold on which problem solutions lie. This allows for the construction of unified models for predicting policies from task parameters. Secondly, the agent should be able to identify when a skill can be hierarchically decomposed into specialized sub-skills. We observe that the policy manifold may be composed of disjoint, piecewise-smooth charts, each one encoding solutions for a subclass of problems. Identifying and modeling sub-skills allows for the aggregation of related behaviors into single, more abstract skills. Finally, the agent should be able to actively select on which problems to practice in order to more rapidly become competent in a skill. Thoughtful and deliberate practice is one of the defining characteristics of human expert performance. By carefully choosing on which problems to practice the agent might more rapidly construct a skill that performs well over a wide range of problems. We address these challenges via a general framework for skill acquisition. We evaluate it on simulated decision-problems and on a physical humanoid robot, and demonstrate that it allows for the efficient and active construction of reusable skills

ScholarWorks@UMass Amherst

Value Function Estimation in Optimal Control via Takagi-Sugeno Models and Linear Programming

Author: Díaz Iza Henry Paúl
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 23/03/2020
Field of study

[ES] La presente Tesis emplea técnicas de programación dinámica y aprendizaje por refuerzo para el control de sistemas no lineales en espacios discretos y continuos. Inicialmente se realiza una revisión de los conceptos básicos de programación dinámica y aprendizaje por refuerzo para sistemas con un número finito de estados. Se analiza la extensión de estas técnicas mediante el uso de funciones de aproximación que permiten ampliar su aplicabilidad a sistemas con un gran número de estados o sistemas continuos. Las contribuciones de la Tesis son: -Se presenta una metodología que combina identificación y ajuste de la función Q, que incluye la identificación de un modelo Takagi-Sugeno, el cálculo de controladores subóptimos a partir de desigualdades matriciales lineales y el consiguiente ajuste basado en datos de la función Q a través de una optimización monotónica. -Se propone una metodología para el aprendizaje de controladores utilizando programación dinámica aproximada a través de programación lineal. La metodología hace que ADP-LP funcione en aplicaciones prácticas de control con estados y acciones continuos. La metodología propuesta estima una cota inferior y superior de la función de valor óptima a través de aproximadores funcionales. Se establecen pautas para los datos y la regularización de regresores con el fin de obtener resultados satisfactorios evitando soluciones no acotadas o mal condicionadas. -Se plantea una metodología bajo el enfoque de programación lineal aplicada a programación dinámica aproximada para obtener una mejor aproximación de la función de valor óptima en una determinada región del espacio de estados. La metodología propone aprender gradualmente una política utilizando datos disponibles sólo en la región de exploración. La exploración incrementa progresivamente la región de aprendizaje hasta obtener una política convergida.[CA] La present Tesi empra tècniques de programació dinàmica i aprenentatge per reforç per al control de sistemes no lineals en espais discrets i continus. Inicialment es realitza una revisió dels conceptes bàsics de programació dinàmica i aprenentatge per reforç per a sistemes amb un nombre finit d'estats. S'analitza l'extensió d'aquestes tècniques mitjançant l'ús de funcions d'aproximació que permeten ampliar la seua aplicabilitat a sistemes amb un gran nombre d'estats o sistemes continus. Les contribucions de la Tesi són: -Es presenta una metodologia que combina identificació i ajust de la funció Q, que inclou la identificació d'un model Takagi-Sugeno, el càlcul de controladors subòptims a partir de desigualtats matricials lineals i el consegüent ajust basat en dades de la funció Q a través d'una optimització monotónica. -Es proposa una metodologia per a l'aprenentatge de controladors utilitzant programació dinàmica aproximada a través de programació lineal. La metodologia fa que ADP-LP funcione en aplicacions pràctiques de control amb estats i accions continus. La metodologia proposada estima una cota inferior i superior de la funció de valor òptima a través de aproximadores funcionals. S'estableixen pautes per a les dades i la regularització de regresores amb la finalitat d'obtenir resultats satisfactoris evitant solucions no fitades o mal condicionades. -Es planteja una metodologia sota l'enfocament de programació lineal aplicada a programació dinàmica aproximada per a obtenir una millor aproximació de la funció de valor òptima en una determinada regió de l'espai d'estats. La metodologia proposa aprendre gradualment una política utilitzant dades disponibles només a la regió d'exploració. L'exploració incrementa progressivament la regió d'aprenentatge fins a obtenir una política convergida.[EN] The present Thesis employs dynamic programming and reinforcement learning techniques in order to obtain optimal policies for controlling nonlinear systems with discrete and continuous states and actions. Initially, a review of the basic concepts of dynamic programming and reinforcement learning is carried out for systems with a finite number of states. After that, the extension of these techniques to systems with a large number of states or continuous state systems is analysed using approximation functions. The contributions of the Thesis are: -A combined identification/Q-function fitting methodology, which involves identification of a Takagi-Sugeno model, computation of (sub)optimal controllers from Linear Matrix Inequalities, and the subsequent data-based fitting of Q-function via monotonic optimisation. -A methodology for learning controllers using approximate dynamic programming via linear programming is presented. The methodology makes that ADP-LP approach can work in practical control applications with continuous state and input spaces. The proposed methodology estimates a lower bound and upper bound of the optimal value function through functional approximators. Guidelines are provided for data and regressor regularisation in order to obtain satisfactory results avoiding unbounded or ill-conditioned solutions. -A methodology of approximate dynamic programming via linear programming in order to obtain a better approximation of the optimal value function in a specific region of state space. The methodology proposes to gradually learn a policy using data available only in the exploration region. The exploration progressively increases the learning region until a converged policy is obtained.This work was supported by the National Department of Higher Education, Science, Technology and Innovation of Ecuador (SENESCYT), and the Spanish ministry of Economy and European Union, grant DPI2016-81002-R (AEI/FEDER,UE). The author also received the grant for a predoctoral stay, Programa de Becas Iberoamérica- Santander Investigación 2018, of the Santander Bank.Díaz Iza, HP. (2020). Value Function Estimation in Optimal Control via Takagi-Sugeno Models and Linear Programming [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/139135TESI

Crossref

RiuNet

A computational framework of human causal generalization

Author: Zhao Bonan
Publication venue: The University of Edinburgh
Publication date: 11/04/2023
Field of study

How do people decide how general a causal relationship is, in terms of the entities or situations it applies to? How can people make these difficult judgments in a fast, efficient way? To address these questions, I designed a novel online experiment interface that systematically measures how people generalize causal relationships, and developed a computational modeling framework that combines program induction (about the hidden causal laws) with non-parametric category inference (about their domains of influence) to account for unique patterns in human causal generalization. In particular, by introducing adaptor grammars to standard Bayesian-symbolic models, this framework formalizes conceptual bootstrapping as a general online inference algorithm that gives rise to compositional causal concepts. Chapter 2 investigates one-shot causal generalization, where I find that participants’ inferences are shaped by the order of the generalization questions they are asked. Chapter 3 looks into few-shot cases, and finds an asymmetry in the formation of causal categories: participants preferentially identify causal laws with features of the agent objects rather than recipients, but this asymmetry disappears when visual cues to causal agency are challenged. The proposed modeling approach can explain both the generalizationorder effect and the causal asymmetry, outperforming a naïve Bayesian account while providing a computationally plausible mechanism for real-world causal generalization. Chapter 4 further extends this framework with adaptor grammars, using a dynamic conceptual repertoire that is enriched over time, allowing the model to cache and later reuse elements of earlier insights. This model predicts systematically different learned concepts when the same evidence is processed in different orders, and across four experiments people’s learning outcomes indeed closely resembled this model’s, differing significantly from alternative accounts

Edinburgh Research Archive