34,694 research outputs found
Object-Oriented Dynamics Learning through Multi-Level Abstraction
Object-based approaches for learning action-conditioned dynamics has
demonstrated promise for generalization and interpretability. However, existing
approaches suffer from structural limitations and optimization difficulties for
common environments with multiple dynamic objects. In this paper, we present a
novel self-supervised learning framework, called Multi-level Abstraction
Object-oriented Predictor (MAOP), which employs a three-level learning
architecture that enables efficient object-based dynamics learning from raw
visual observations. We also design a spatial-temporal relational reasoning
mechanism for MAOP to support instance-level dynamics learning and handle
partial observability. Our results show that MAOP significantly outperforms
previous methods in terms of sample efficiency and generalization over novel
environments for learning environment models. We also demonstrate that learned
dynamics models enable efficient planning in unseen environments, comparable to
true environment models. In addition, MAOP learns semantically and visually
interpretable disentangled representations.Comment: Accepted to the Thirthy-Fourth AAAI Conference On Artificial
Intelligence (AAAI), 202
Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks
Recently, skeleton based action recognition gains more popularity due to
cost-effective depth sensors coupled with real-time skeleton estimation
algorithms. Traditional approaches based on handcrafted features are limited to
represent the complexity of motion patterns. Recent methods that use Recurrent
Neural Networks (RNN) to handle raw skeletons only focus on the contextual
dependency in the temporal domain and neglect the spatial configurations of
articulated skeletons. In this paper, we propose a novel two-stream RNN
architecture to model both temporal dynamics and spatial configurations for
skeleton based action recognition. We explore two different structures for the
temporal stream: stacked RNN and hierarchical RNN. Hierarchical RNN is designed
according to human body kinematics. We also propose two effective methods to
model the spatial structure by converting the spatial graph into a sequence of
joints. To improve generalization of our model, we further exploit 3D
transformation based data augmentation techniques including rotation and
scaling transformation to transform the 3D coordinates of skeletons during
training. Experiments on 3D action recognition benchmark datasets show that our
method brings a considerable improvement for a variety of actions, i.e.,
generic actions, interaction activities and gestures.Comment: Accepted to IEEE International Conference on Computer Vision and
Pattern Recognition (CVPR) 201
A Comparative Study of Reservoir Computing for Temporal Signal Processing
Reservoir computing (RC) is a novel approach to time series prediction using
recurrent neural networks. In RC, an input signal perturbs the intrinsic
dynamics of a medium called a reservoir. A readout layer is then trained to
reconstruct a target output from the reservoir's state. The multitude of RC
architectures and evaluation metrics poses a challenge to both practitioners
and theorists who study the task-solving performance and computational power of
RC. In addition, in contrast to traditional computation models, the reservoir
is a dynamical system in which computation and memory are inseparable, and
therefore hard to analyze. Here, we compare echo state networks (ESN), a
popular RC architecture, with tapped-delay lines (DL) and nonlinear
autoregressive exogenous (NARX) networks, which we use to model systems with
limited computation and limited memory respectively. We compare the performance
of the three systems while computing three common benchmark time series:
H{\'e}non Map, NARMA10, and NARMA20. We find that the role of the reservoir in
the reservoir computing paradigm goes beyond providing a memory of the past
inputs. The DL and the NARX network have higher memorization capability, but
fall short of the generalization power of the ESN
- …