29 research outputs found

    Hidden Parameter Recurrent State Space Models For Changing Dynamics Scenarios

    Get PDF
    Recurrent State-space models (RSSMs) are highly expressive models for learning patterns in time series data and system identification. However, these models assume that the dynamics are fixed and unchanging, which is rarely the case in real-world scenarios. Many control applications often exhibit tasks with similar but not identical dynamics which can be modeled as a latent variable. We introduce the Hidden Parameter Recurrent State Space Models (HiP-RSSMs), a framework that parametrizes a family of related dynamical systems with a low-dimensional set of latent factors. We present a simple and effective way of learning and performing inference over this Gaussian graphical model that avoids approximations like variational inference. We show that HiP-RSSMs outperforms RSSMs and competing multi-task models on several challenging robotic benchmarks both on real-world systems and simulations

    Hidden Parameter Recurrent State Space Models For Changing Dynamics Scenarios

    Get PDF
    Recurrent State-space models (RSSMs) are highly expressive models for learning patterns in time series data and system identification. However, these models assume that the dynamics are fixed and unchanging, which is rarely the case in real-world scenarios. Many control applications often exhibit tasks with similar but not identical dynamics which can be modeled as a latent variable. We introduce the Hidden Parameter Recurrent State Space Models (HiP-RSSMs), a framework that parametrizes a family of related dynamical systems with a low-dimensional set of latent factors. We present a simple and effective way of learning and performing inference over this Gaussian graphical model that avoids approximations like variational inference. We show that HiP-RSSMs outperforms RSSMs and competing multi-task models on several challenging robotic benchmarks both on real-world systems and simulations.Comment: Published at the International Conference on Learning Representations, ICLR 202

    Hidden Parameter Recurrent State Space Models For Changing Dynamics Scenarios

    Full text link
    Recurrent State-space models (RSSMs) are highly expressive models for learning patterns in time series data and system identification. However, these models assume that the dynamics are fixed and unchanging, which is rarely the case in real-world scenarios. Many control applications often exhibit tasks with similar but not identical dynamics which can be modeled as a latent variable. We introduce the Hidden Parameter Recurrent State Space Models (HiP-RSSMs), a framework that parametrizes a family of related dynamical systems with a low-dimensional set of latent factors. We present a simple and effective way of learning and performing inference over this Gaussian graphical model that avoids approximations like variational inference. We show that HiP-RSSMs outperforms RSSMs and competing multi-task models on several challenging robotic benchmarks both on real-world systems and simulations.Comment: Published at the International Conference on Learning Representations, ICLR 202

    Latent State-Space Models for Control

    Get PDF
    Learning to control robots without human supervision and prolonged engineering effort has been a long-term dream in the intersection of machine learning and robotics. If successful, it would enable many novel applications from soft robotics over human-robot interaction to quick adaptation to unseen tasks or robotic setups. A key driving force behind this dream are inherit limitations of classical control algorithms that restrict applicability to low-dimensional and engineered state-spaces, prohibiting the use of high-dimensional sensors such cameras or touchpads. As an alternative to classical control methods, reinforcement learning presumes no prior knowledge of a robot's dynamics and paired with deep learning opens the door to use high-dimensional sensory information of any kind. Yet, reinforcement learning has only achieved limited impact on real-time robot control due to its high demand for real-world interactions (among other reasons). Model-based approaches promise to be much more data efficient, but present the challenge of engineering accurate simulators. As building a simulator comes with many of the same challenges as designing a controller, using engineered simulators is not a satisfactory solution for the generic goal of learning to control; most of the engineering work would still have to be done to build the simulator. Instead, learning such a model, in particular a latent state-space model (LSSM), promises to resolve us from engineering a simulator while still reaping the benefits of having one. A learned latent space can compactly represent high-dimensional sensor information and store all relevant information for prediction and control. In this thesis, we show how to perform system identification of complex and nonlinear systems based on high-dimensional observations purely from raw sensory data. Despite their complexity, such systems can often be approximated well by a set of linear dynamical systems if broken into appropriate subsequences. This mechanism not only helps us find good approximations of dynamics, but also gives us deeper insight into the underlying system. Combining Bayesian inference, Variational Autoencoders and Concrete relaxations, we show how to learn a richer and more meaningful state-space, for example by encoding joint constraints or collisions with walls in a maze, from partial and high-dimensional observations. In a setting with time-varying dynamics, we show how our inference method for continuous switching variables can infer changing but unobserved physical properties that govern the dynamics of a system, such as masses or link lengths in robotic simulations. This inference happens online in our learned filter without retraining or fine-tuning of model parameters. Quantitatively, we find that such representations translate into a gain of accuracy of learned dynamics showcased on various simulated tasks and that they promise to be helpful for policy optimization. Building on this work, we show how this LSSM can be used to learn a probabilistic model of real-world robot dynamics, such as from a self-built drone and a 7 degrees of freedom robot arm. No prior knowledge of the flight dynamics or kinematics is assumed. On top, we propose a novel model-based reinforcement learning method where both a parameterized policy and value function are optimized entirely by propagating stochastic analytic gradients through generated latent trajectories. Our learned thrust-attitude controller can fly a drone to a randomly placed marker in an enclosed environment, and steer a joint velocity controlled robot arm to random end effector positions in Cartesian space. This can be achieved with less than an hour of interactions on the real system. The control policy is learned entirely in the learned simulator and can be applied without modification or fine-tuning to the real system. Last, we propose a novel exploration criterion for the development of autonomous agents: Empowerment Gain. Different to other exploration criteria, this approach ties together an agent's entire perception-control loop and its current capabilities to act. Perspectively, this method will help us learn models of the world that are actually relevant to realizing an agent's influence in the world. As a key insight, our learned models do not actually have to be perfect simulators of the entire world and all of its processes, rather they need to convey the information necessary to enable an agent to interact with the world around him. We show how this criterion compares to, and in some ways incorporates, other intrinsic motivations such as novelty seeking, surprise minimization and learning progress. While our method still ensures exploration of the entire space, it prefers regions with greater potential for realizing an agent's influence in the world. In conclusion, we give answers to three major questions: (1) how do we learn a LSSM from raw sensory data, (2) how do we use it for control and (3) what parts of the world do we need to explore and model in the first place. While the last part remains in a theoretical and conceptual stage, we demonstrate the first two on two different real-world robotic platforms. We focused on proposing general purpose methods that are as broadly applicable as they can be, but are still successful in a real-world setting

    Learning Options via Compression

    Full text link
    Identifying statistical regularities in solutions to some tasks in multi-task reinforcement learning can accelerate the learning of new tasks. Skill learning offers one way of identifying these regularities by decomposing pre-collected experiences into a sequence of skills. A popular approach to skill learning is maximizing the likelihood of the pre-collected experience with latent variable models, where the latent variables represent the skills. However, there are often many solutions that maximize the likelihood equally well, including degenerate solutions. To address this underspecification, we propose a new objective that combines the maximum likelihood objective with a penalty on the description length of the skills. This penalty incentivizes the skills to maximally extract common structures from the experiences. Empirically, our objective learns skills that solve downstream tasks in fewer samples compared to skills learned from only maximizing likelihood. Further, while most prior works in the offline multi-task setting focus on tasks with low-dimensional observations, our objective can scale to challenging tasks with high-dimensional image observations.Comment: Published at NeurIPS 202

    Reinforcement Learning in Presence of Discrete Markovian Context Evolution

    Get PDF
    We consider a context-dependent Reinforcement Learning (RL) setting, which is characterized by: a) an unknown finite number of not directly observable contexts; b) abrupt (discontinuous) context changes occurring during an episode; and c) Markovian context evolution. We argue that this challenging case is often met in applications and we tackle it using a Bayesian model-based approach and variational inference. We adapt a sticky Hierarchical Dirichlet Process (HDP) prior for model learning, which is arguably best-suited for infinite Markov chain modeling. We then derive a context distillation procedure, which identifies and removes spurious contexts in an unsupervised fashion. We argue that the combination of these two components allows inferring the number of contexts from data thus dealing with the context cardinality assumption. We then find the representation of the optimal policy enabling efficient policy learning using off-the-shelf RL algorithms. Finally, we demonstrate empirically (using gym environments cart-pole swing-up, drone, intersection) that our approach succeeds where state-of-the-art methods of other frameworks fail and elaborate on the reasons for such failures
    corecore