19,200 research outputs found
Memory-Augmented Temporal Dynamic Learning for Action Recognition
Human actions captured in video sequences contain two crucial factors for
action recognition, i.e., visual appearance and motion dynamics. To model these
two aspects, Convolutional and Recurrent Neural Networks (CNNs and RNNs) are
adopted in most existing successful methods for recognizing actions. However,
CNN based methods are limited in modeling long-term motion dynamics. RNNs are
able to learn temporal motion dynamics but lack effective ways to tackle
unsteady dynamics in long-duration motion. In this work, we propose a
memory-augmented temporal dynamic learning network, which learns to write the
most evident information into an external memory module and ignore irrelevant
ones. In particular, we present a differential memory controller to make a
discrete decision on whether the external memory module should be updated with
current feature. The discrete memory controller takes in the memory history,
context embedding and current feature as inputs and controls information flow
into the external memory module. Additionally, we train this discrete memory
controller using straight-through estimator. We evaluate this end-to-end system
on benchmark datasets (UCF101 and HMDB51) of human action recognition. The
experimental results show consistent improvements on both datasets over prior
works and our baselines.Comment: The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19
What Can I Do Around Here? Deep Functional Scene Understanding for Cognitive Robots
For robots that have the capability to interact with the physical environment
through their end effectors, understanding the surrounding scenes is not merely
a task of image classification or object recognition. To perform actual tasks,
it is critical for the robot to have a functional understanding of the visual
scene. Here, we address the problem of localizing and recognition of functional
areas from an arbitrary indoor scene, formulated as a two-stage deep learning
based detection pipeline. A new scene functionality testing-bed, which is
complied from two publicly available indoor scene datasets, is used for
evaluation. Our method is evaluated quantitatively on the new dataset,
demonstrating the ability to perform efficient recognition of functional areas
from arbitrary indoor scenes. We also demonstrate that our detection model can
be generalized onto novel indoor scenes by cross validating it with the images
from two different datasets
- …