1,383 research outputs found

    A survey on policy search algorithms for learning robot controllers in a handful of trials

    Get PDF
    Most policy search algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word "big-data", we refer to this challenge as "micro-data reinforcement learning". We show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based policy search), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots (e.g., humanoids), designing generic priors, and optimizing the computing time.Comment: 21 pages, 3 figures, 4 algorithms, accepted at IEEE Transactions on Robotic

    Developing agile motor skills on virtual and real humanoids

    Get PDF
    Demonstrating strength and agility on virtual and real humanoids has been an important goal in computer graphics and robotics. However, developing physics- based controllers for various agile motor skills requires a tremendous amount of prior knowledge and manual labor due to complex mechanisms of the motor skills. The focus of the dissertation is to develop a set of computational tools to expedite the design process of physics-based controllers that can execute a variety of agile motor skills on virtual and real humanoids. Instead of designing directly controllers real humanoids, this dissertation takes an approach that develops appropriate theories and models in virtual simulation and systematically transfers the solutions to hardware systems. The algorithms and frameworks in this dissertation span various topics from spe- cific physics-based controllers to general learning frameworks. We first present an online algorithm for controlling falling and landing motions of virtual characters. The proposed algorithm is effective and efficient enough to generate falling motions for a wide range of arbitrary initial conditions in real-time. Next, we present a robust falling strategy for real humanoids that can manage a wide range of perturbations by planning the optimal contact sequences. We then introduce an iterative learning framework to easily design various agile motions, which is inspired by human learn- ing techniques. The proposed framework is followed by novel algorithms to efficiently optimize control parameters for the target tasks, especially when they have many constraints or parameterized goals. Finally, we introduce an iterative approach for exporting simulation-optimized control policies to hardware of robots to reduce the number of hardware experiments, that accompany expensive costs and labors.Ph.D

    Motor control and strategy discovery for physically simulated characters

    Get PDF
    In physics-based character animation, motions are realized through control of simulated characters along with their interactions with the virtual environment. In this thesis, we study the problem of character control on two levels: joint-level motor control which transforms control signals to joint torques, and high-level motion control which outputs joint-level control signals given the current state of the character and the environment and the task objective. We propose a Modified Articulated-Body Algorithm (MABA) which achieves stable proportional-derivative (PD) low-level motor control with superior theoretical time complexity, practical efficiency and stability than prior implementations. We further propose a high-level motion control framework based on deep reinforcement learning (DRL) which enables the discovery of appropriate motion strategies without human demonstrations to complete a task objective. To facilitate the learning of realistic human motions, we propose a Pose Variational Autoencoder (P-VAE) to constrain the DRL actions to a subspace of natural poses. Our learning framework can be further combined with a sample-efficient Bayesian Diversity Search (BDS) algorithm and novel policy seeking to discover diverse strategies for tasks with multiple modes, such as various athletic jumping tasks

    컴퓨터를 활용한 여러 사람의 동작 연출

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 공과대학 전기·컴퓨터공학부, 2017. 8. 이제희.Choreographing motion is the process of converting written stories or messages into the real movement of actors. In performances or movie, directors spend a consid-erable time and effort because it is the primary factor that audiences concentrate. If multiple actors exist in the scene, choreography becomes more challenging. The fundamental difficulty is that the coordination between actors should precisely be ad-justed. Spatio-temporal coordination is the first requirement that must be satisfied, and causality/mood are also another important coordinations. Directors use several assistant tools such as storyboards or roughly crafted 3D animations, which can visu-alize the flow of movements, to organize ideas or to explain them to actors. However, it is difficult to use the tools because artistry and considerable training effort are required. It also doesnt have ability to give any suggestions or feedbacks. Finally, the amount of manual labor increases exponentially as the number of actor increases. In this thesis, we propose computational approaches on choreographing multiple actor motion. The ultimate goal is to enable novice users easily to generate motions of multiple actors without substantial effort. We first show an approach to generate motions for shadow theatre, where actors should carefully collaborate to achieve the same goal. The results are comparable to ones that are made by professional ac-tors. In the next, we present an interactive animation system for pre-visualization, where users exploits an intuitive graphical interface for scene description. Given a de-scription, the system can generate motions for the characters in the scene that match the description. Finally, we propose two controller designs (combining regression with trajectory optimization, evolutionary deep reinforcement learning) for physically sim-ulated actors, which guarantee physical validity of the resultant motions.Chapter 1 Introduction 1 Chapter 2 Background 8 2.1 Motion Generation Technique 9 2.1.1 Motion Editing and Synthesis for Single-Character 9 2.1.2 Motion Editing and Synthesis for Multi-Character 9 2.1.3 Motion Planning 10 2.1.4 Motion Control by Reinforcement Learning 11 2.1.5 Pose/Motion Estimation from Incomplete Information 11 2.1.6 Diversity on Resultant Motions 12 2.2 Authoring System 12 2.2.1 System using High-level Input 12 2.2.2 User-interactive System 13 2.3 Shadow Theatre 14 2.3.1 Shadow Generation 14 2.3.2 Shadow for Artistic Purpose 14 2.3.3 Viewing Shadow Theatre as Collages/Mosaics of People 15 2.4 Physics-based Controller Design 15 2.4.1 Controllers for Various Characters 15 2.4.2 Trajectory Optimization 15 2.4.3 Sampling-based Optimization 16 2.4.4 Model-Based Controller Design 16 2.4.5 Direct Policy Learning 17 2.4.6 Deep Reinforcement Learning for Control 17 Chapter 3 Motion Generation for Shadow Theatre 19 3.1 Overview 19 3.2 Shadow Theatre Problem 21 3.2.1 Problem Definition 21 3.2.2 Approaches of Professional Actors 22 3.3 Discovery of Principal Poses 24 3.3.1 Optimization Formulation 24 3.3.2 Optimization Algorithm 27 3.4 Animating Principal Poses 29 3.4.1 Initial Configuration 29 3.4.2 Optimization for Motion Generation 30 3.5 Experimental Results 32 3.5.1 Implementation Details 33 3.5.2 Animation 34 3.5.3 3D Fabrication 34 3.6 Discussion 37 Chapter 4 Interactive Animation System for Pre-visualization 40 4.1 Overview 40 4.2 Graphical Scene Description 42 4.3 Candidate Scene Generation 45 4.3.1 Connecting Paths 47 4.3.2 Motion Cascade 47 4.3.3 Motion Selection For Each Cycle 49 4.3.4 Cycle Ordering 51 4.3.5 Generalized Paths and Cycles 52 4.3.6 Motion Editing 54 4.4 Scene Ranking 54 4.4.1 Ranking Criteria 54 4.4.2 Scene Ranking Measures 57 4.5 Scene Refinement 58 4.6 Experimental Results 62 4.7 Discussion 65 Chapter 5 Physics-based Design and Control 69 5.1 Overview 69 5.2 Combining Regression with Trajectory Optimization 70 5.2.1 Simulation and Motor Skills 71 5.2.2 Control Adaptation 75 5.2.3 Control Parameterization 79 5.2.4 Efficient Construction 81 5.2.5 Experimental Results 84 5.2.6 Discussion 89 5.3 Example-Guided Control by Deep Reinforcement Learning 91 5.3.1 System Overview 92 5.3.2 Initial Policy Construction 95 5.3.3 Evolutionary Deep Q-Learning 100 5.3.4 Experimental Results 107 5.3.5 Discussion 114 Chapter 6 Conclusion 119 6.1 Contribution 119 6.2 Future Work 120 요약 135Docto

    Probabilistic Models of Motor Production

    Get PDF
    N. Bernstein defined the ability of the central neural system (CNS) to control many degrees of freedom of a physical body with all its redundancy and flexibility as the main problem in motor control. He pointed at that man-made mechanisms usually have one, sometimes two degrees of freedom (DOF); when the number of DOF increases further, it becomes prohibitively hard to control them. The brain, however, seems to perform such control effortlessly. He suggested the way the brain might deal with it: when a motor skill is being acquired, the brain artificially limits the degrees of freedoms, leaving only one or two. As the skill level increases, the brain gradually "frees" the previously fixed DOF, applying control when needed and in directions which have to be corrected, eventually arriving to the control scheme where all the DOF are "free". This approach of reducing the dimensionality of motor control remains relevant even today. One the possibles solutions of the Bernstetin's problem is the hypothesis of motor primitives (MPs) - small building blocks that constitute complex movements and facilitite motor learnirng and task completion. Just like in the visual system, having a homogenious hierarchical architecture built of similar computational elements may be beneficial. Studying such a complicated object as brain, it is important to define at which level of details one works and which questions one aims to answer. David Marr suggested three levels of analysis: 1. computational, analysing which problem the system solves; 2. algorithmic, questioning which representation the system uses and which computations it performs; 3. implementational, finding how such computations are performed by neurons in the brain. In this thesis we stay at the first two levels, seeking for the basic representation of motor output. In this work we present a new model of motor primitives that comprises multiple interacting latent dynamical systems, and give it a full Bayesian treatment. Modelling within the Bayesian framework, in my opinion, must become the new standard in hypothesis testing in neuroscience. Only the Bayesian framework gives us guarantees when dealing with the inevitable plethora of hidden variables and uncertainty. The special type of coupling of dynamical systems we proposed, based on the Product of Experts, has many natural interpretations in the Bayesian framework. If the dynamical systems run in parallel, it yields Bayesian cue integration. If they are organized hierarchically due to serial coupling, we get hierarchical priors over the dynamics. If one of the dynamical systems represents sensory state, we arrive to the sensory-motor primitives. The compact representation that follows from the variational treatment allows learning of a motor primitives library. Learned separately, combined motion can be represented as a matrix of coupling values. We performed a set of experiments to compare different models of motor primitives. In a series of 2-alternative forced choice (2AFC) experiments participants were discriminating natural and synthesised movements, thus running a graphics Turing test. When available, Bayesian model score predicted the naturalness of the perceived movements. For simple movements, like walking, Bayesian model comparison and psychophysics tests indicate that one dynamical system is sufficient to describe the data. For more complex movements, like walking and waving, motion can be better represented as a set of coupled dynamical systems. We also experimentally confirmed that Bayesian treatment of model learning on motion data is superior to the simple point estimate of latent parameters. Experiments with non-periodic movements show that they do not benefit from more complex latent dynamics, despite having high kinematic complexity. By having a fully Bayesian models, we could quantitatively disentangle the influence of motion dynamics and pose on the perception of naturalness. We confirmed that rich and correct dynamics is more important than the kinematic representation. There are numerous further directions of research. In the models we devised, for multiple parts, even though the latent dynamics was factorized on a set of interacting systems, the kinematic parts were completely independent. Thus, interaction between the kinematic parts could be mediated only by the latent dynamics interactions. A more flexible model would allow a dense interaction on the kinematic level too. Another important problem relates to the representation of time in Markov chains. Discrete time Markov chains form an approximation to continuous dynamics. As time step is assumed to be fixed, we face with the problem of time step selection. Time is also not a explicit parameter in Markov chains. This also prohibits explicit optimization of time as parameter and reasoning (inference) about it. For example, in optimal control boundary conditions are usually set at exact time points, which is not an ecological scenario, where time is usually a parameter of optimization. Making time an explicit parameter in dynamics may alleviate this

    Benchmarking Deep Reinforcement Learning for Continuous Control

    Get PDF
    Recently, researchers have made significant progress combining the advances in deep learning for learning feature representations with reinforcement learning. Some notable examples include training agents to play Atari games based on raw pixel data and to acquire advanced manipulation skills using raw sensory inputs. However, it has been difficult to quantify progress in the domain of continuous control due to the lack of a commonly adopted benchmark. In this work, we present a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, tasks with partial observations, and tasks with hierarchical structure. We report novel findings based on the systematic evaluation of a range of implemented reinforcement learning algorithms. Both the benchmark and reference implementations are released at https://github.com/rllab/rllab in order to facilitate experimental reproducibility and to encourage adoption by other researchers.Comment: 14 pages, ICML 201

    Bootstrapping of parameterized skills through hybrid optimization in task and policy spaces

    Get PDF
    Queißer J, Steil JJ. Bootstrapping of parameterized skills through hybrid optimization in task and policy spaces. Frontiers in Robotics and AI. 2018;5:49.Modern robotic applications create high demands on adaptation of actions with respect to variance in a given task. Reinforcement learning is able to optimize for these changing conditions, but relearning from scratch is hardly feasible due to the high number of required rollouts. We propose a parameterized skill that generalizes to new actions for changing task parameters, which is encoded as a meta-learner that provides parameters for task-specific dynamic motion primitives. Our work shows that utilizing parameterized skills for initialization of the optimization process leads to a more effective incremental task learning. In addition, we introduce a hybrid optimization method that combines a fast coarse optimization on a manifold of policy parameters with a fine grained parameter search in the unrestricted space of actions. The proposed algorithm reduces the number of required rollouts for adaptation to new task conditions. Application in illustrative toy scenarios, for a 10-DOF planar arm, and a humanoid robot point reaching task validate the approach

    Multi-modal Skill Memories for Online Learning of Interactive Robot Movement Generation

    Get PDF
    Queißer J. Multi-modal Skill Memories for Online Learning of Interactive Robot Movement Generation. Bielefeld: Universität Bielefeld; 2018.Modern robotic applications pose complex requirements with respect to the adaptation of actions regarding the variability in a given task. Reinforcement learning can optimize for changing conditions, but relearning from scratch is hardly feasible due to the high number of required rollouts. This work proposes a parameterized skill that generalizes to new actions for changing task parameters. The actions are encoded by a meta-learner that provides parameters for task-specific dynamic motion primitives. Experimental evaluation shows that the utilization of parameterized skills for initialization of the optimization process leads to a more effective incremental task learning. A proposed hybrid optimization method combines a fast coarse optimization on a manifold of policy parameters with a fine-grained parameter search in the unrestricted space of actions. It is shown that the developed algorithm reduces the number of required rollouts for adaptation to new task conditions. Further, this work presents a transfer learning approach for adaptation of learned skills to new situations. Application in illustrative toy scenarios, for a 10-DOF planar arm, a humanoid robot point reaching task and parameterized drumming on a pneumatic robot validate the approach. But parameterized skills that are applied on complex robotic systems pose further challenges: the dynamics of the robot and the interaction with the environment introduce model inaccuracies. In particular, high-level skill acquisition on highly compliant robotic systems such as pneumatically driven or soft actuators is hardly feasible. Since learning of the complete dynamics model is not feasible due to the high complexity, this thesis examines two alternative approaches: First, an improvement of the low-level control based on an equilibrium model of the robot. Utilization of an equilibrium model reduces the learning complexity and this thesis evaluates its applicability for control of pneumatic and industrial light-weight robots. Second, an extension of parameterized skills to generalize for forward signals of action primitives that result in an enhanced control quality of complex robotic systems. This thesis argues for a shift in the complexity of learning the full dynamics of the robot to a lower dimensional task-related learning problem. Due to the generalization in relation to the task variability, online learning for complex robots as well as complex scenarios becomes feasible. An experimental evaluation investigates the generalization capabilities of the proposed online learning system for robot motion generation. Evaluation is performed through simulation of a compliant 2-DOF arm and scalability to a complex robotic system is demonstrated for a pneumatically driven humanoid robot with 8-DOF
    corecore