748 research outputs found
A Hierarchical Spatio-Temporal Graph Convolutional Neural Network for Anomaly Detection in Videos
Deep learning models have been widely used for anomaly detection in
surveillance videos. Typical models are equipped with the capability to
reconstruct normal videos and evaluate the reconstruction errors on anomalous
videos to indicate the extent of abnormalities. However, existing approaches
suffer from two disadvantages. Firstly, they can only encode the movements of
each identity independently, without considering the interactions among
identities which may also indicate anomalies. Secondly, they leverage
inflexible models whose structures are fixed under different scenes, this
configuration disables the understanding of scenes. In this paper, we propose a
Hierarchical Spatio-Temporal Graph Convolutional Neural Network (HSTGCNN) to
address these problems, the HSTGCNN is composed of multiple branches that
correspond to different levels of graph representations. High-level graph
representations encode the trajectories of people and the interactions among
multiple identities while low-level graph representations encode the local body
postures of each person. Furthermore, we propose to weightedly combine multiple
branches that are better at different scenes. An improvement over single-level
graph representations is achieved in this way. An understanding of scenes is
achieved and serves anomaly detection. High-level graph representations are
assigned higher weights to encode moving speed and directions of people in
low-resolution videos while low-level graph representations are assigned higher
weights to encode human skeletons in high-resolution videos. Experimental
results show that the proposed HSTGCNN significantly outperforms current
state-of-the-art models on four benchmark datasets (UCSD Pedestrian,
ShanghaiTech, CUHK Avenue and IITB-Corridor) by using much less learnable
parameters.Comment: Accepted to IEEE Transactions on Circuits and Systems for Video
Technology (T-CSVT
A Study of Dance Movement Capture and Posture Recognition Method Based on Vision Sensors
With the development of technology, posture recognition methods have been applied in more and more fields. However, there is relatively little research on posture recognition in dance. Therefore, this paper studied the capture and posture recognition of dance movements to understand the usability of the proposed method in dance posture recognition. Firstly, the Kinect V2 visual sensor was used to capture dance movements and obtain human skeletal joint data. Then, a three-dimensional convolutional neural network (3D CNN) model was designed by fusing joint coordinate features with joint velocity features as general features for recognizing different dance postures. Through experiments on NTU60 and self-built dance datasets, it was found that the 3D CNN performed best with a dropout rate of 0.4, a ReLU activation function, and fusion features. Compared to other posture recognition methods, the recognition rates of the 3D CNN on CS and CV in NTU60 were 88.8% and 95.3%, respectively, while the average recognition rate on the dance dataset reached 98.72%, which was higher than others. The experimental results demonstrate the effectiveness of our proposed method for dance posture recognition, providing a new approach for posture recognition research and making contributions to the inheritance of folk dances. Doi: 10.28991/HIJ-2023-04-02-03 Full Text: PD
Context-Dependent Diffusion Network for Visual Relationship Detection
Visual relationship detection can bridge the gap between computer vision and
natural language for scene understanding of images. Different from pure object
recognition tasks, the relation triplets of subject-predicate-object lie on an
extreme diversity space, such as \textit{person-behind-person} and
\textit{car-behind-building}, while suffering from the problem of combinatorial
explosion. In this paper, we propose a context-dependent diffusion network
(CDDN) framework to deal with visual relationship detection. To capture the
interactions of different object instances, two types of graphs, word semantic
graph and visual scene graph, are constructed to encode global context
interdependency. The semantic graph is built through language priors to model
semantic correlations across objects, whilst the visual scene graph defines the
connections of scene objects so as to utilize the surrounding scene
information. For the graph-structured data, we design a diffusion network to
adaptively aggregate information from contexts, which can effectively learn
latent representations of visual relationships and well cater to visual
relationship detection in view of its isomorphic invariance to graphs.
Experiments on two widely-used datasets demonstrate that our proposed method is
more effective and achieves the state-of-the-art performance.Comment: 8 pages, 3 figures, 2018 ACM Multimedia Conference (MM'18
Understanding the Robustness of Skeleton-based Action Recognition under Adversarial Attack
Action recognition has been heavily employed in many applications such as
autonomous vehicles, surveillance, etc, where its robustness is a primary
concern. In this paper, we examine the robustness of state-of-the-art action
recognizers against adversarial attack, which has been rarely investigated so
far. To this end, we propose a new method to attack action recognizers that
rely on 3D skeletal motion. Our method involves an innovative perceptual loss
that ensures the imperceptibility of the attack. Empirical studies demonstrate
that our method is effective in both white-box and black-box scenarios. Its
generalizability is evidenced on a variety of action recognizers and datasets.
Its versatility is shown in different attacking strategies. Its deceitfulness
is proven in extensive perceptual studies. Our method shows that adversarial
attack on 3D skeletal motions, one type of time-series data, is significantly
different from traditional adversarial attack problems. Its success raises
serious concern on the robustness of action recognizers and provides insights
on potential improvements.Comment: Accepted in CVPR 2021. arXiv admin note: substantial text overlap
with arXiv:1911.0710
- …