1,266 research outputs found
Hierarchical Decomposition of Nonlinear Dynamics and Control for System Identification and Policy Distillation
The control of nonlinear dynamical systems remains a major challenge for
autonomous agents. Current trends in reinforcement learning (RL) focus on
complex representations of dynamics and policies, which have yielded impressive
results in solving a variety of hard control tasks. However, this new
sophistication and extremely over-parameterized models have come with the cost
of an overall reduction in our ability to interpret the resulting policies. In
this paper, we take inspiration from the control community and apply the
principles of hybrid switching systems in order to break down complex dynamics
into simpler components. We exploit the rich representational power of
probabilistic graphical models and derive an expectation-maximization (EM)
algorithm for learning a sequence model to capture the temporal structure of
the data and automatically decompose nonlinear dynamics into stochastic
switching linear dynamical systems. Moreover, we show how this framework of
switching models enables extracting hierarchies of Markovian and
auto-regressive locally linear controllers from nonlinear experts in an
imitation learning scenario.Comment: 2nd Annual Conference on Learning for Dynamics and Contro
Model-Based Reinforcement Learning for Stochastic Hybrid Systems
Optimal control of general nonlinear systems is a central challenge in
automation. Enabled by powerful function approximators, data-driven approaches
to control have recently successfully tackled challenging robotic applications.
However, such methods often obscure the structure of dynamics and control
behind black-box over-parameterized representations, thus limiting our ability
to understand closed-loop behavior. This paper adopts a hybrid-system view of
nonlinear modeling and control that lends an explicit hierarchical structure to
the problem and breaks down complex dynamics into simpler localized units. We
consider a sequence modeling paradigm that captures the temporal structure of
the data and derive an expectation-maximization (EM) algorithm that
automatically decomposes nonlinear dynamics into stochastic piecewise affine
dynamical systems with nonlinear boundaries. Furthermore, we show that these
time-series models naturally admit a closed-loop extension that we use to
extract local polynomial feedback controllers from nonlinear experts via
behavioral cloning. Finally, we introduce a novel hybrid relative entropy
policy search (Hb-REPS) technique that incorporates the hierarchical nature of
hybrid systems and optimizes a set of time-invariant local feedback controllers
derived from a local polynomial approximation of a global state-value function
Mesoscopic modeling of hidden spiking neurons
Can we use spiking neural networks (SNN) as generative models of multi-neuronal recordings, while taking into account that most neurons are unobserved? Modeling the unobserved neurons with large pools of hidden spiking neurons leads to severely underconstrained problems that are hard to tackle with maximum likelihood estimation. In this work, we use coarse-graining and mean-field approximations to derive a bottom-up, neuronally-grounded latent variable model (neuLVM), where the activity of the unobserved neurons is reduced to a low-dimensional mesoscopic description. In contrast to previous latent variable models, neuLVM can be explicitly mapped to a recurrent, multi-population SNN, giving it a transparent biological interpretation. We show, on synthetic spike trains, that a few observed neurons are sufficient for neuLVM to perform efficient model inversion of large SNNs, in the sense that it can recover connectivity parameters, infer single-trial latent population activity, reproduce ongoing metastable dynamics, and generalize when subjected to perturbations mimicking optogenetic stimulation
Deep Learning and Statistical Models for Time-Critical Pedestrian Behaviour Prediction
The time it takes for a classifier to make an accurate prediction can be
crucial in many behaviour recognition problems. For example, an autonomous
vehicle should detect hazardous pedestrian behaviour early enough for it to
take appropriate measures. In this context, we compare the switching linear
dynamical system (SLDS) and a three-layered bi-directional long short-term
memory (LSTM) neural network, which are applied to infer pedestrian behaviour
from motion tracks. We show that, though the neural network model achieves an
accuracy of 80%, it requires long sequences to achieve this (100 samples or
more). The SLDS, has a lower accuracy of 74%, but it achieves this result with
short sequences (10 samples). To our knowledge, such a comparison on sequence
length has not been considered in the literature before. The results provide a
key intuition of the suitability of the models in time-critical problems
Switching Linear Dynamical Systems for Noise Robust Speech Recognition
Real world applications such as hands-free speech recognition of isolated digits may have to deal with potentially very noisy environments. Existing state-of-the-art solutions to this problem use feature-based HMMs, with a preprocessing stage to clean the noisy signal. However, the effect that raw signal noise has on the induced HMM features is poorly understood, and limits the performance of the HMM system. An alternative to feature-based HMMs is to model the raw signal, which has the potential advantage that including an explicit noise model is straightforward. Here we jointly model the dynamics of both the raw speech signal and the noise, using a Switching Linear Dynamical System (SLDS). The new model was tested on isolated digit utterances corrupted by Gaussian noise. Contrary to the SAR-HMM, which provides a model of uncorrupted raw speech, the SLDS is comparatively noise robust and also significantly outperforms a state-of-the-art feature-based HMM. The computational complexity of the SLDS scales exponentially with the length of the time series. To counter this we use Expectation Correction which provides a stable and accurate linear-time approximation for this important class of models, aiding their further application in acoustic modelling
- …