748 research outputs found

    A Hierarchical Spatio-Temporal Graph Convolutional Neural Network for Anomaly Detection in Videos

    Full text link
    Deep learning models have been widely used for anomaly detection in surveillance videos. Typical models are equipped with the capability to reconstruct normal videos and evaluate the reconstruction errors on anomalous videos to indicate the extent of abnormalities. However, existing approaches suffer from two disadvantages. Firstly, they can only encode the movements of each identity independently, without considering the interactions among identities which may also indicate anomalies. Secondly, they leverage inflexible models whose structures are fixed under different scenes, this configuration disables the understanding of scenes. In this paper, we propose a Hierarchical Spatio-Temporal Graph Convolutional Neural Network (HSTGCNN) to address these problems, the HSTGCNN is composed of multiple branches that correspond to different levels of graph representations. High-level graph representations encode the trajectories of people and the interactions among multiple identities while low-level graph representations encode the local body postures of each person. Furthermore, we propose to weightedly combine multiple branches that are better at different scenes. An improvement over single-level graph representations is achieved in this way. An understanding of scenes is achieved and serves anomaly detection. High-level graph representations are assigned higher weights to encode moving speed and directions of people in low-resolution videos while low-level graph representations are assigned higher weights to encode human skeletons in high-resolution videos. Experimental results show that the proposed HSTGCNN significantly outperforms current state-of-the-art models on four benchmark datasets (UCSD Pedestrian, ShanghaiTech, CUHK Avenue and IITB-Corridor) by using much less learnable parameters.Comment: Accepted to IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT

    A Study of Dance Movement Capture and Posture Recognition Method Based on Vision Sensors

    Get PDF
    With the development of technology, posture recognition methods have been applied in more and more fields. However, there is relatively little research on posture recognition in dance. Therefore, this paper studied the capture and posture recognition of dance movements to understand the usability of the proposed method in dance posture recognition. Firstly, the Kinect V2 visual sensor was used to capture dance movements and obtain human skeletal joint data. Then, a three-dimensional convolutional neural network (3D CNN) model was designed by fusing joint coordinate features with joint velocity features as general features for recognizing different dance postures. Through experiments on NTU60 and self-built dance datasets, it was found that the 3D CNN performed best with a dropout rate of 0.4, a ReLU activation function, and fusion features. Compared to other posture recognition methods, the recognition rates of the 3D CNN on CS and CV in NTU60 were 88.8% and 95.3%, respectively, while the average recognition rate on the dance dataset reached 98.72%, which was higher than others. The experimental results demonstrate the effectiveness of our proposed method for dance posture recognition, providing a new approach for posture recognition research and making contributions to the inheritance of folk dances. Doi: 10.28991/HIJ-2023-04-02-03 Full Text: PD

    Context-Dependent Diffusion Network for Visual Relationship Detection

    Full text link
    Visual relationship detection can bridge the gap between computer vision and natural language for scene understanding of images. Different from pure object recognition tasks, the relation triplets of subject-predicate-object lie on an extreme diversity space, such as \textit{person-behind-person} and \textit{car-behind-building}, while suffering from the problem of combinatorial explosion. In this paper, we propose a context-dependent diffusion network (CDDN) framework to deal with visual relationship detection. To capture the interactions of different object instances, two types of graphs, word semantic graph and visual scene graph, are constructed to encode global context interdependency. The semantic graph is built through language priors to model semantic correlations across objects, whilst the visual scene graph defines the connections of scene objects so as to utilize the surrounding scene information. For the graph-structured data, we design a diffusion network to adaptively aggregate information from contexts, which can effectively learn latent representations of visual relationships and well cater to visual relationship detection in view of its isomorphic invariance to graphs. Experiments on two widely-used datasets demonstrate that our proposed method is more effective and achieves the state-of-the-art performance.Comment: 8 pages, 3 figures, 2018 ACM Multimedia Conference (MM'18

    Understanding the Robustness of Skeleton-based Action Recognition under Adversarial Attack

    Get PDF
    Action recognition has been heavily employed in many applications such as autonomous vehicles, surveillance, etc, where its robustness is a primary concern. In this paper, we examine the robustness of state-of-the-art action recognizers against adversarial attack, which has been rarely investigated so far. To this end, we propose a new method to attack action recognizers that rely on 3D skeletal motion. Our method involves an innovative perceptual loss that ensures the imperceptibility of the attack. Empirical studies demonstrate that our method is effective in both white-box and black-box scenarios. Its generalizability is evidenced on a variety of action recognizers and datasets. Its versatility is shown in different attacking strategies. Its deceitfulness is proven in extensive perceptual studies. Our method shows that adversarial attack on 3D skeletal motions, one type of time-series data, is significantly different from traditional adversarial attack problems. Its success raises serious concern on the robustness of action recognizers and provides insights on potential improvements.Comment: Accepted in CVPR 2021. arXiv admin note: substantial text overlap with arXiv:1911.0710
    • …
    corecore