2,186 research outputs found
Crossmodal Attentive Skill Learner
This paper presents the Crossmodal Attentive Skill Learner (CASL), integrated
with the recently-introduced Asynchronous Advantage Option-Critic (A2OC)
architecture [Harb et al., 2017] to enable hierarchical reinforcement learning
across multiple sensory inputs. We provide concrete examples where the approach
not only improves performance in a single task, but accelerates transfer to new
tasks. We demonstrate the attention mechanism anticipates and identifies useful
latent features, while filtering irrelevant sensor modalities during execution.
We modify the Arcade Learning Environment [Bellemare et al., 2013] to support
audio queries, and conduct evaluations of crossmodal learning in the Atari 2600
game Amidar. Finally, building on the recent work of Babaeizadeh et al. [2017],
we open-source a fast hybrid CPU-GPU implementation of CASL.Comment: International Conference on Autonomous Agents and Multiagent Systems
(AAMAS) 2018, NIPS 2017 Deep Reinforcement Learning Symposiu
Factorized Inference in Deep Markov Models for Incomplete Multimodal Time Series
Integrating deep learning with latent state space models has the potential to
yield temporal models that are powerful, yet tractable and interpretable.
Unfortunately, current models are not designed to handle missing data or
multiple data modalities, which are both prevalent in real-world data. In this
work, we introduce a factorized inference method for Multimodal Deep Markov
Models (MDMMs), allowing us to filter and smooth in the presence of missing
data, while also performing uncertainty-aware multimodal fusion. We derive this
method by factorizing the posterior p(z|x) for non-linear state space models,
and develop a variational backward-forward algorithm for inference. Because our
method handles incompleteness over both time and modalities, it is capable of
interpolation, extrapolation, conditional generation, label prediction, and
weakly supervised learning of multimodal time series. We demonstrate these
capabilities on both synthetic and real-world multimodal data under high levels
of data deletion. Our method performs well even with more than 50% missing
data, and outperforms existing deep approaches to inference in latent time
series.Comment: 8 pages, 4 figures, accepted to AAAI 2020, code available at:
https://github.com/ztangent/multimodal-dm
Human Motion Trajectory Prediction: A Survey
With growing numbers of intelligent autonomous systems in human environments,
the ability of such systems to perceive, understand and anticipate human
behavior becomes increasingly important. Specifically, predicting future
positions of dynamic agents and planning considering such predictions are key
tasks for self-driving vehicles, service robots and advanced surveillance
systems. This paper provides a survey of human motion trajectory prediction. We
review, analyze and structure a large selection of work from different
communities and propose a taxonomy that categorizes existing methods based on
the motion modeling approach and level of contextual information used. We
provide an overview of the existing datasets and performance metrics. We
discuss limitations of the state of the art and outline directions for further
research.Comment: Submitted to the International Journal of Robotics Research (IJRR),
37 page
Memory Fusion Network for Multi-view Sequential Learning
Multi-view sequential learning is a fundamental problem in machine learning
dealing with multi-view sequences. In a multi-view sequence, there exists two
forms of interactions between different views: view-specific interactions and
cross-view interactions. In this paper, we present a new neural architecture
for multi-view sequential learning called the Memory Fusion Network (MFN) that
explicitly accounts for both interactions in a neural architecture and
continuously models them through time. The first component of the MFN is called
the System of LSTMs, where view-specific interactions are learned in isolation
through assigning an LSTM function to each view. The cross-view interactions
are then identified using a special attention mechanism called the Delta-memory
Attention Network (DMAN) and summarized through time with a Multi-view Gated
Memory. Through extensive experimentation, MFN is compared to various proposed
approaches for multi-view sequential learning on multiple publicly available
benchmark datasets. MFN outperforms all the existing multi-view approaches.
Furthermore, MFN outperforms all current state-of-the-art models, setting new
state-of-the-art results for these multi-view datasets.Comment: AAAI 2018 Oral Presentatio
- …