5,306 research outputs found
Skeleton-Based Human Action Recognition with Global Context-Aware Attention LSTM Networks
Human action recognition in 3D skeleton sequences has attracted a lot of
research attention. Recently, Long Short-Term Memory (LSTM) networks have shown
promising performance in this task due to their strengths in modeling the
dependencies and dynamics in sequential data. As not all skeletal joints are
informative for action recognition, and the irrelevant joints often bring noise
which can degrade the performance, we need to pay more attention to the
informative ones. However, the original LSTM network does not have explicit
attention ability. In this paper, we propose a new class of LSTM network,
Global Context-Aware Attention LSTM (GCA-LSTM), for skeleton based action
recognition. This network is capable of selectively focusing on the informative
joints in each frame of each skeleton sequence by using a global context memory
cell. To further improve the attention capability of our network, we also
introduce a recurrent attention mechanism, with which the attention performance
of the network can be enhanced progressively. Moreover, we propose a stepwise
training scheme in order to train our network effectively. Our approach
achieves state-of-the-art performance on five challenging benchmark datasets
for skeleton based action recognition
Recommended from our members
Mechanism for neurotransmitter-receptor matching.
Synaptic communication requires the expression of functional postsynaptic receptors that match the presynaptically released neurotransmitter. The ability of neurons to switch the transmitter they release is increasingly well documented, and these switches require changes in the postsynaptic receptor population. Although the activity-dependent molecular mechanism of neurotransmitter switching is increasingly well understood, the basis of specification of postsynaptic neurotransmitter receptors matching the newly expressed transmitter is unknown. Using a functional assay, we show that sustained application of glutamate to embryonic vertebrate skeletal muscle cells cultured before innervation is necessary and sufficient to up-regulate ionotropic glutamate receptors from a pool of different receptors expressed at low levels. Up-regulation of these ionotropic receptors is independent of signaling by metabotropic glutamate receptors. Both imaging of glutamate-induced calcium elevations and Western blots reveal ionotropic glutamate receptor expression prior to immunocytochemical detection. Sustained application of glutamate to skeletal myotomes in vivo is necessary and sufficient for up-regulation of membrane expression of the GluN1 NMDA receptor subunit. Pharmacological antagonists and morpholinos implicate p38 and Jun kinases and MEF2C in the signal cascade leading to ionotropic glutamate receptor expression. The results suggest a mechanism by which neuronal release of transmitter up-regulates postsynaptic expression of appropriate transmitter receptors following neurotransmitter switching and may contribute to the proper expression of receptors at the time of initial innervation
NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding
Research on depth-based human activity analysis achieved outstanding
performance and demonstrated the effectiveness of 3D representation for action
recognition. The existing depth-based and RGB+D-based action recognition
benchmarks have a number of limitations, including the lack of large-scale
training samples, realistic number of distinct class categories, diversity in
camera views, varied environmental conditions, and variety of human subjects.
In this work, we introduce a large-scale dataset for RGB+D human action
recognition, which is collected from 106 distinct subjects and contains more
than 114 thousand video samples and 8 million frames. This dataset contains 120
different action classes including daily, mutual, and health-related
activities. We evaluate the performance of a series of existing 3D activity
analysis methods on this dataset, and show the advantage of applying deep
learning methods for 3D-based human action recognition. Furthermore, we
investigate a novel one-shot 3D activity recognition problem on our dataset,
and a simple yet effective Action-Part Semantic Relevance-aware (APSR)
framework is proposed for this task, which yields promising results for
recognition of the novel action classes. We believe the introduction of this
large-scale dataset will enable the community to apply, adapt, and develop
various data-hungry learning techniques for depth-based and RGB+D-based human
activity understanding. [The dataset is available at:
http://rose1.ntu.edu.sg/Datasets/actionRecognition.asp]Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence
(TPAMI
ContextVP: Fully Context-Aware Video Prediction
Video prediction models based on convolutional networks, recurrent networks,
and their combinations often result in blurry predictions. We identify an
important contributing factor for imprecise predictions that has not been
studied adequately in the literature: blind spots, i.e., lack of access to all
relevant past information for accurately predicting the future. To address this
issue, we introduce a fully context-aware architecture that captures the entire
available past context for each pixel using Parallel Multi-Dimensional LSTM
units and aggregates it using blending units. Our model outperforms a strong
baseline network of 20 recurrent convolutional layers and yields
state-of-the-art performance for next step prediction on three challenging
real-world video datasets: Human 3.6M, Caltech Pedestrian, and UCF-101.
Moreover, it does so with fewer parameters than several recently proposed
models, and does not rely on deep convolutional networks, multi-scale
architectures, separation of background and foreground modeling, motion flow
learning, or adversarial training. These results highlight that full awareness
of past context is of crucial importance for video prediction.Comment: 19 pages. ECCV 2018 oral presentation. Project webpage is at
https://wonmin-byeon.github.io/publication/2018-ecc
Heterogeneous Domain Generalization via Domain Mixup
One of the main drawbacks of deep Convolutional Neural Networks (DCNN) is
that they lack generalization capability. In this work, we focus on the problem
of heterogeneous domain generalization which aims to improve the generalization
capability across different tasks, which is, how to learn a DCNN model with
multiple domain data such that the trained feature extractor can be generalized
to supporting recognition of novel categories in a novel target domain. To
solve this problem, we propose a novel heterogeneous domain generalization
method by mixing up samples across multiple source domains with two different
sampling strategies. Our experimental results based on the Visual Decathlon
benchmark demonstrates the effectiveness of our proposed method. The code is
released in \url{https://github.com/wyf0912/MIXALL
- …