28,626 research outputs found
Predictive Coding for Dynamic Visual Processing: Development of Functional Hierarchy in a Multiple Spatio-Temporal Scales RNN Model
The current paper proposes a novel predictive coding type neural network
model, the predictive multiple spatio-temporal scales recurrent neural network
(P-MSTRNN). The P-MSTRNN learns to predict visually perceived human whole-body
cyclic movement patterns by exploiting multiscale spatio-temporal constraints
imposed on network dynamics by using differently sized receptive fields as well
as different time constant values for each layer. After learning, the network
becomes able to proactively imitate target movement patterns by inferring or
recognizing corresponding intentions by means of the regression of prediction
error. Results show that the network can develop a functional hierarchy by
developing a different type of dynamic structure at each layer. The paper
examines how model performance during pattern generation as well as predictive
imitation varies depending on the stage of learning. The number of limit cycle
attractors corresponding to target movement patterns increases as learning
proceeds. And, transient dynamics developing early in the learning process
successfully perform pattern generation and predictive imitation tasks. The
paper concludes that exploitation of transient dynamics facilitates successful
task performance during early learning periods.Comment: Accepted in Neural Computation (MIT press
A Fast Integrated Planning and Control Framework for Autonomous Driving via Imitation Learning
For safe and efficient planning and control in autonomous driving, we need a
driving policy which can achieve desirable driving quality in long-term horizon
with guaranteed safety and feasibility. Optimization-based approaches, such as
Model Predictive Control (MPC), can provide such optimal policies, but their
computational complexity is generally unacceptable for real-time
implementation. To address this problem, we propose a fast integrated planning
and control framework that combines learning- and optimization-based approaches
in a two-layer hierarchical structure. The first layer, defined as the "policy
layer", is established by a neural network which learns the long-term optimal
driving policy generated by MPC. The second layer, called the "execution
layer", is a short-term optimization-based controller that tracks the reference
trajecotries given by the "policy layer" with guaranteed short-term safety and
feasibility. Moreover, with efficient and highly-representative features, a
small-size neural network is sufficient in the "policy layer" to handle many
complicated driving scenarios. This renders online imitation learning with
Dataset Aggregation (DAgger) so that the performance of the "policy layer" can
be improved rapidly and continuously online. Several exampled driving scenarios
are demonstrated to verify the effectiveness and efficiency of the proposed
framework
Exploiting Structure for Scalable and Robust Deep Learning
Deep learning has seen great success training deep neural networks for complex prediction problems, such as large-scale image recognition, short-term time-series forecasting, and learning behavioral models for games with simple dynamics. However, neural networks have a number of weaknesses: 1) they are not sample-efficient and 2) they are often not robust against (adversarial) input perturbations. Hence, it is challenging to train neural networks for problems with exponential complexity, such as multi-agent games, complex long-term spatiotemporal dynamics, or noisy high-resolution image data.
This thesis contributes methods to improve the sample efficiency, expressive power, and robustness of neural networks, by exploiting various forms of low-dimensional structure, such as spatiotemporal hierarchy and multi-agent coordination. We show the effectiveness of this approach in multiple learning paradigms: in both the supervised learning (e.g., imitation learning) and reinforcement learning settings.
First, we introduce hierarchical neural networks that model both short-term actions and long-term goals from data, and can learn human-level behavioral models for spatiotemporal multi-agent games, such as basketball, using imitation learning.
Second, in reinforcement learning, we show that behavioral policies with a hierarchical latent structure can efficiently learn forms of multi-agent coordination, which enables a form of structured exploration for faster learning.
Third, we showcase tensor-train recurrent neural networks that can model high-order mutliplicative structure in dynamical systems (e.g., Lorenz dynamics). We show that this model class gives state-of-the-art long-term forecasting performance with very long time horizons for both simulation and real-world traffic and climate data.
Finally, we demonstrate two methods for neural network robustness: 1) stability training, a form of stochastic data augmentation to make neural networks more robust, and 2) neural fingerprinting, a method that detects adversarial examples by validating the network’s behavior in the neighborhood of any given input.
In sum, this thesis takes a step to enable machine learning for the next scale of problem complexity, such as rich spatiotemporal multi-agent games and large-scale robust predictions.</p
Causal Confusion in Imitation Learning
Behavioral cloning reduces policy learning to supervised learning by training
a discriminative model to predict expert actions given observations. Such
discriminative models are non-causal: the training procedure is unaware of the
causal structure of the interaction between the expert and the environment. We
point out that ignoring causality is particularly damaging because of the
distributional shift in imitation learning. In particular, it leads to a
counter-intuitive "causal misidentification" phenomenon: access to more
information can yield worse performance. We investigate how this problem
arises, and propose a solution to combat it through targeted
interventions---either environment interaction or expert queries---to determine
the correct causal model. We show that causal misidentification occurs in
several benchmark control domains as well as realistic driving settings, and
validate our solution against DAgger and other baselines and ablations.Comment: Published at NeurIPS 2019 9 pages, plus references and appendice
- …