2,713 research outputs found
Semantic Graph Convolutional Networks for 3D Human Pose Regression
In this paper, we study the problem of learning Graph Convolutional Networks
(GCNs) for regression. Current architectures of GCNs are limited to the small
receptive field of convolution filters and shared transformation matrix for
each node. To address these limitations, we propose Semantic Graph
Convolutional Networks (SemGCN), a novel neural network architecture that
operates on regression tasks with graph-structured data. SemGCN learns to
capture semantic information such as local and global node relationships, which
is not explicitly represented in the graph. These semantic relationships can be
learned through end-to-end training from the ground truth without additional
supervision or hand-crafted rules. We further investigate applying SemGCN to 3D
human pose regression. Our formulation is intuitive and sufficient since both
2D and 3D human poses can be represented as a structured graph encoding the
relationships between joints in the skeleton of a human body. We carry out
comprehensive studies to validate our method. The results prove that SemGCN
outperforms state of the art while using 90% fewer parameters.Comment: In CVPR 2019 (13 pages including supplementary material). The code
can be found at https://github.com/garyzhao/SemGC
Richly Activated Graph Convolutional Network for Action Recognition with Incomplete Skeletons
Current methods for skeleton-based human action recognition usually work with
completely observed skeletons. However, in real scenarios, it is prone to
capture incomplete and noisy skeletons, which will deteriorate the performance
of traditional models. To enhance the robustness of action recognition models
to incomplete skeletons, we propose a multi-stream graph convolutional network
(GCN) for exploring sufficient discriminative features distributed over all
skeleton joints. Here, each stream of the network is only responsible for
learning features from currently unactivated joints, which are distinguished by
the class activation maps (CAM) obtained by preceding streams, so that the
activated joints of the proposed method are obviously more than traditional
methods. Thus, the proposed method is termed richly activated GCN (RA-GCN),
where the richly discovered features will improve the robustness of the model.
Compared to the state-of-the-art methods, the RA-GCN achieves comparable
performance on the NTU RGB+D dataset. Moreover, on a synthetic occlusion
dataset, the performance deterioration can be alleviated by the RA-GCN
significantly.Comment: Accepted by ICIP 2019, 5 pages, 3 figures, 3 table
What-and-Where to Match: Deep Spatially Multiplicative Integration Networks for Person Re-identification
Matching pedestrians across disjoint camera views, known as person
re-identification (re-id), is a challenging problem that is of importance to
visual recognition and surveillance. Most existing methods exploit local
regions within spatial manipulation to perform matching in local
correspondence. However, they essentially extract \emph{fixed} representations
from pre-divided regions for each image and perform matching based on the
extracted representation subsequently. For models in this pipeline, local finer
patterns that are crucial to distinguish positive pairs from negative ones
cannot be captured, and thus making them underperformed. In this paper, we
propose a novel deep multiplicative integration gating function, which answers
the question of \emph{what-and-where to match} for effective person re-id. To
address \emph{what} to match, our deep network emphasizes common local patterns
by learning joint representations in a multiplicative way. The network
comprises two Convolutional Neural Networks (CNNs) to extract convolutional
activations, and generates relevant descriptors for pedestrian matching. This
thus, leads to flexible representations for pair-wise images. To address
\emph{where} to match, we combat the spatial misalignment by performing
spatially recurrent pooling via a four-directional recurrent neural network to
impose spatial dependency over all positions with respect to the entire image.
The proposed network is designed to be end-to-end trainable to characterize
local pairwise feature interactions in a spatially aligned manner. To
demonstrate the superiority of our method, extensive experiments are conducted
over three benchmark data sets: VIPeR, CUHK03 and Market-1501.Comment: Published at Pattern Recognition, Elsevie
Attention Mechanisms in Computer Vision: A Survey
Humans can naturally and effectively find salient regions in complex scenes.
Motivated by this observation, attention mechanisms were introduced into
computer vision with the aim of imitating this aspect of the human visual
system. Such an attention mechanism can be regarded as a dynamic weight
adjustment process based on features of the input image. Attention mechanisms
have achieved great success in many visual tasks, including image
classification, object detection, semantic segmentation, video understanding,
image generation, 3D vision, multi-modal tasks and self-supervised learning. In
this survey, we provide a comprehensive review of various attention mechanisms
in computer vision and categorize them according to approach, such as channel
attention, spatial attention, temporal attention and branch attention; a
related repository https://github.com/MenghaoGuo/Awesome-Vision-Attentions is
dedicated to collecting related work. We also suggest future directions for
attention mechanism research.Comment: 27 pages, 9 figure
Topology-aware MLP for Skeleton-based Action Recognition
Graph convolution networks (GCNs) have achieved remarkable performance in
skeleton-based action recognition. However, existing previous GCN-based methods
have relied excessively on elaborate human body priors and constructed complex
feature aggregation mechanisms, which limits the generalizability of networks.
To solve these problems, we propose a novel Spatial Topology Gating Unit
(STGU), which is an MLP-based variant without extra priors, to capture the
co-occurrence topology features that encode the spatial dependency across all
joints. In STGU, to model the sample-specific and completely independent
point-wise topology attention, a new gate-based feature interaction mechanism
is introduced to activate the features point-to-point by the attention map
generated from the input. Based on the STGU, in this work, we propose the first
topology-aware MLP-based model, Ta-MLP, for skeleton-based action recognition.
In comparison with existing previous methods on three large-scale datasets,
Ta-MLP achieves competitive performance. In addition, Ta-MLP reduces the
parameters by up to 62.5% with favorable results. Compared with previous
state-of-the-art (SOAT) approaches, Ta-MLP pushes the frontier of real-time
action recognition. The code will be available at
https://github.com/BUPTSJZhang/Ta-MLP.Comment: 10 pages, 9 figure
Two-person Graph Convolutional Network for Skeleton-based Human Interaction Recognition
Graph convolutional networks (GCNs) have been the predominant methods in
skeleton-based human action recognition, including human-human interaction
recognition. However, when dealing with interaction sequences, current
GCN-based methods simply split the two-person skeleton into two discrete graphs
and perform graph convolution separately as done for single-person action
classification. Such operations ignore rich interactive information and hinder
effective spatial inter-body relationship modeling. To overcome the above
shortcoming, we introduce a novel unified two-person graph to represent
inter-body and intra-body correlations between joints. Experiments show
accuracy improvements in recognizing both interactions and individual actions
when utilizing the proposed two-person graph topology. In addition, We design
several graph labeling strategies to supervise the model to learn discriminant
spatial-temporal interactive features. Finally, we propose a two-person graph
convolutional network (2P-GCN). Our model achieves state-of-the-art results on
four benchmarks of three interaction datasets: SBU, interaction subsets of
NTU-RGB+D and NTU-RGB+D 120
- …