101 research outputs found

    Using Mean Embeddings for State Estimation and Reinforcement Learning

    Get PDF
    To act in complex, high-dimensional environments, autonomous systems require versatile state estimation techniques and compact state representations. State estimation is crucial when the system only has access to stochastic measurements or partial observations. Furthermore, in combination with models of the system such techniques allow to predict the future which enables the system to asses the outcome of possible decisions. Compact state representations alleviate the curse of dimensionality by distilling the important information from high-dimensional observations. Due to noisy sensory information and non-perfect models of the system, estimates of the state never reflect the true state perfectly but are always subject to errors. The natural choice to incorporate the uncertainty about the state estimate is to use a probability distribution as representation. This results in the so called belief state. High-dimensional observations, for example images, often contain much less information than conveyed by their dimensionality. But also if all the information is necessary to describe the state of the system—for example, think of the state of a swarm with the positions of all agents—a less complex description might be a sufficient representation. In such situations, finding the generative distribution that explains the state would give a much more compact while informative representation. Traditionally, parametric distributions have been used as state representations such as most prevalently the Gaussian distribution. However, in many cases a unimodal distribution might not be sufficient to represent the belief state. Using multi-modal probability distributions, instead, requires more advanced approaches such as mixture models or particle-based Monte Carlo methods. Learning mixture models is however not straight-forward and often results in locally optimal solutions. Similarly, maintaining a good population of particles during inference is a complicated and cumbersome process. A third approach is kernel density estimation which is located at the intersection of mixture models and particle-based approaches. Still, performing inference with any of these approaches requires heuristics that lead to poor performance and a limited scalability to higher dimensional spaces. A recent technique that alleviates this problem are the embeddings of probability distributions into reproducing kernel Hilbert spaces (RKHS). Conditional distributions can be embedded as operators based on which a framework for inference has been presented that allows to apply the sum rule, the product rule and Bayes’ rule entirely in Hilbert space. Using sample based estimators and the kernel-trick of the representer theorem allows to represent the operations as vector-matrix manipulations. The contributions of this thesis are based on or inspired by the embeddings of distributions into reproducing kernel Hilbert spaces. In the first part of this thesis, I propose additions to the framework for nonparametric inference that allow the inference operators to scale more gracefully with the number of samples in the training set. The first contribution is an alternative approach to the conditional embedding operator formulated as a least-squares problem i which allows to use only a subset of the data as representation while using the full data set to learn the conditional operator. I call this operator the subspace conditional embedding operator. Inspired by the least-squares derivations of the Kalman filter, I furthermore propose an alternative operator for Bayesian updates in Hilbert space, the kernel Kalman rule. This alternative approach is numerically more robust than the kernel Bayes rule presented in the framework for non-parametric inference and scales better with the number of samples. Based on the kernel Kalman rule, I derive the kernel Kalman filter and the kernel forward-backward smoother to perform state estimation, prediction and smoothing based on Hilbert space embeddings of the belief state. This representation is able to capture multi-modal distributions and inference resolves--due to the kernel trick--into easy matrix manipulations. In the second part of this thesis, I propose a representation for large sets of homogeneous observations. Specifically, I consider the problem of learning a controller for object assembly and object manipulation with a robotic swarm. I assume a swarm of homogeneous robots that are controlled by a common input signal, e.g., the gradient of a light source or a magnetic field. Learning policies for swarms is a challenging problem since the state space grows with the number of agents and becomes quickly very high dimensional. Furthermore, the exact number of agents and the order of the agents in the observation is not important to solve the task. To approach this issue, I propose the swarm kernel which uses a Hilbert space embedding to represent the swarm. Instead of the exact positions of the agents in the swarm, the embedding estimates the generative distribution behind the swarm configuration. The specific agent positions are regarded as samples of this distribution. Since the swarm kernel compares the embeddings of distributions, it can compare swarm configurations with varying numbers of individuals and is invariant to the permutation of the agents. I present a hierarchical approach for solving the object manipulation task where I assume a high-level object assembly policy as given. To learn the low-level object pushing policy, I use the swarm kernel with an actor-critic policy search method. The policies which I learn in simulation can be directly transferred to a real robotic system. In the last part of this thesis, I investigate how we can employ the idea of kernel mean embeddings to deep reinforcement learning. As in the previous part, I consider a variable number of homogeneous observations—such as robot swarms where the number of agents can change. Another example is the representation of 3D structures as point clouds. The number of points in such clouds can vary strongly and the order of the points in a vectorized representation is arbitrary. The common architectures for neural networks have a fixed structure that requires that the dimensionality of inputs and outputs is known in advance. A variable number of inputs can only be processed by applying tricks. To approach this problem, I propose the deep M-embeddings which are inspired by the kernel mean embeddings. The deep M-embeddings provide a network structure to compute a fixed length representation from a variable number of inputs. Additionally, the deep M-embeddings exploit the homogeneous nature of the inputs to reduce the number of parameters in the network and, thus, make the learning easier. Similar to the swarm kernel, the policies learned with the deep M-embeddings can be transferred to different swarm sizes and different number of objects in the environment without further learning

    Online Machine Learning for Inference from Multivariate Time-series

    Get PDF
    Inference and data analysis over networks have become significant areas of research due to the increasing prevalence of interconnected systems and the growing volume of data they produce. Many of these systems generate data in the form of multivariate time series, which are collections of time series data that are observed simultaneously across multiple variables. For example, EEG measurements of the brain produce multivariate time series data that record the electrical activity of different brain regions over time. Cyber-physical systems generate multivariate time series that capture the behaviour of physical systems in response to cybernetic inputs. Similarly, financial time series reflect the dynamics of multiple financial instruments or market indices over time. Through the analysis of these time series, one can uncover important details about the behavior of the system, detect patterns, and make predictions. Therefore, designing effective methods for data analysis and inference over networks of multivariate time series is a crucial area of research with numerous applications across various fields. In this Ph.D. Thesis, our focus is on identifying the directed relationships between time series and leveraging this information to design algorithms for data prediction as well as missing data imputation. This Ph.D. thesis is organized as a compendium of papers, which consists of seven chapters and appendices. The first chapter is dedicated to motivation and literature survey, whereas in the second chapter, we present the fundamental concepts that readers should understand to grasp the material presented in the dissertation with ease. In the third chapter, we present three online nonlinear topology identification algorithms, namely NL-TISO, RFNL-TISO, and RFNL-TIRSO. In this chapter, we assume the data is generated from a sparse nonlinear vector autoregressive model (VAR), and propose online data-driven solutions for identifying nonlinear VAR topology. We also provide convergence guarantees in terms of dynamic regret for the proposed algorithm RFNL-TIRSO. Chapters four and five of the dissertation delve into the issue of missing data and explore how the learned topology can be leveraged to address this challenge. Chapter five is distinct from other chapters in its exclusive focus on edge flow data and introduces an online imputation strategy based on a simplicial complex framework that leverages the known network structure in addition to the learned topology. Chapter six of the dissertation takes a different approach, assuming that the data is generated from nonlinear structural equation models. In this chapter, we propose an online topology identification algorithm using a time-structured approach, incorporating information from both the data and the model evolution. The algorithm is shown to have convergence guarantees achieved by bounding the dynamic regret. Finally, chapter seven of the dissertation provides concluding remarks and outlines potential future research directions.publishedVersio

    Learning Dynamic Systems for Intention Recognition in Human-Robot-Cooperation

    Get PDF
    This thesis is concerned with intention recognition for a humanoid robot and investigates how the challenges of uncertain and incomplete observations, a high degree of detail of the used models, and real-time inference may be addressed by modeling the human rationale as hybrid, dynamic Bayesian networks and performing inference with these models. The key focus lies on the automatic identification of the employed nonlinear stochastic dependencies and the situation-specific inference
    • …