Search CORE

10 research outputs found

Relation-Based Associative Joint Location for Human Pose Estimation in Videos

Author: Dang Yonghao
Yin Jianqin
Publication venue
Publication date: 08/07/2021
Field of study

Video-based human pose estimation (HPE) is a vital yet challenging task. While deep learning methods have made significant progress for the HPE, most approaches to this task detect each joint independently, damaging the pose structural information. In this paper, unlike the prior methods, we propose a Relation-based Pose Semantics Transfer Network (RPSTN) to locate joints associatively. Specifically, we design a lightweight joint relation extractor (JRE) to model the pose structural features and associatively generate heatmaps for joints by modeling the relation between any two joints heuristically instead of building each joint heatmap independently. Actually, the proposed JRE module models the spatial configuration of human poses through the relationship between any two joints. Moreover, considering the temporal semantic continuity of videos, the pose semantic information in the current frame is beneficial for guiding the location of joints in the next frame. Therefore, we use the idea of knowledge reuse to propagate the pose semantic information between consecutive frames. In this way, the proposed RPSTN captures temporal dynamics of poses. On the one hand, the JRE module can infer invisible joints according to the relationship between the invisible joints and other visible joints in space. On the other hand, in the time, the propose model can transfer the pose semantic features from the non-occluded frame to the occluded frame to locate occluded joints. Therefore, our method is robust to the occlusion and achieves state-of-the-art results on the two challenging datasets, which demonstrates its effectiveness for video-based human pose estimation. We will release the code and models publicly

arXiv.org e-Print Archive

Spatial-Temporal Decoupling Contrastive Learning for Skeleton-based Human Action Recognition

Author: Dang Yonghao
Yin Jianqin
Zhang Shaojie
Publication venue
Publication date: 18/01/2024
Field of study

Skeleton-based action recognition is a central task in human-computer interaction. However, most previous methods suffer from two issues: (i) semantic ambiguity arising from spatial-temporal information mixture; and (ii) overlooking the explicit exploitation of the latent data distributions (i.e., the intra-class variations and inter-class relations), thereby leading to sub-optimum solutions of the skeleton encoders. To mitigate this, we propose a spatial-temporal decoupling contrastive learning (STD-CL) framework to obtain discriminative and semantically distinct representations from the sequences, which can be incorporated into various previous skeleton encoders and can be removed when testing. Specifically, we decouple the global features into spatial-specific and temporal-specific features to reduce the spatial-temporal coupling of features. Furthermore, to explicitly exploit the latent data distributions, we employ the attentive features to contrastive learning, which models the cross-sequence semantic relations by pulling together the features from the positive pairs and pushing away the negative pairs. Extensive experiments show that STD-CL with four various skeleton encoders (HCN, 2S-AGCN, CTR-GCN, and Hyperformer) achieves solid improvements on NTU60, NTU120, and NW-UCLA benchmarks. The code will be released soon

arXiv.org e-Print Archive

Topology-aware MLP for Skeleton-based Action Recognition

Author: Dang Yonghao
Fu Jiajun
Yin Jianqin
Zhang Shaojie
Publication venue
Publication date: 04/09/2023
Field of study

Graph convolution networks (GCNs) have achieved remarkable performance in skeleton-based action recognition. However, existing previous GCN-based methods have relied excessively on elaborate human body priors and constructed complex feature aggregation mechanisms, which limits the generalizability of networks. To solve these problems, we propose a novel Spatial Topology Gating Unit (STGU), which is an MLP-based variant without extra priors, to capture the co-occurrence topology features that encode the spatial dependency across all joints. In STGU, to model the sample-specific and completely independent point-wise topology attention, a new gate-based feature interaction mechanism is introduced to activate the features point-to-point by the attention map generated from the input. Based on the STGU, in this work, we propose the first topology-aware MLP-based model, Ta-MLP, for skeleton-based action recognition. In comparison with existing previous methods on three large-scale datasets, Ta-MLP achieves competitive performance. In addition, Ta-MLP reduces the parameters by up to 62.5% with favorable results. Compared with previous state-of-the-art (SOAT) approaches, Ta-MLP pushes the frontier of real-time action recognition. The code will be available at https://github.com/BUPTSJZhang/Ta-MLP.Comment: 10 pages, 9 figure

arXiv.org e-Print Archive

BiHRNet: A Binary high-resolution network for Human Pose Estimation

Author: Dang Yonghao
Sun Xueyao
Yin Jianqin
Zhang Zhicheng
Publication venue
Publication date: 16/11/2023
Field of study

Human Pose Estimation (HPE) plays a crucial role in computer vision applications. However, it is difficult to deploy state-of-the-art models on resouce-limited devices due to the high computational costs of the networks. In this work, a binary human pose estimator named BiHRNet(Binary HRNet) is proposed, whose weights and activations are expressed as

\pm

1. BiHRNet retains the keypoint extraction ability of HRNet, while using fewer computing resources by adapting binary neural network (BNN). In order to reduce the accuracy drop caused by network binarization, two categories of techniques are proposed in this work. For optimizing the training process for binary pose estimator, we propose a new loss function combining KL divergence loss with AWing loss, which makes the binary network obtain more comprehensive output distribution from its real-valued counterpart to reduce information loss caused by binarization. For designing more binarization-friendly structures, we propose a new information reconstruction bottleneck called IR Bottleneck to retain more information in the initial stage of the network. In addition, we also propose a multi-scale basic block called MS-Block for information retention. Our work has less computation cost with few precision drop. Experimental results demonstrate that BiHRNet achieves a PCKh of 87.9 on the MPII dataset, which outperforms all binary pose estimation networks. On the challenging of COCO dataset, the proposed method enables the binary neural network to achieve 70.8 mAP, which is better than most tested lightweight full-precision networks.Comment: 12 pages, 6 figure

arXiv.org e-Print Archive

Learning Human Kinematics by Modeling Temporal Correlations between Joints for Video-based Human Pose Estimation

Author: Dang Yonghao
Hu Yanzhu
Liu Jiping
Yin Jianqin
Zhang Shaojie
Publication venue
Publication date: 22/07/2022
Field of study

Estimating human poses from videos is critical in human-computer interaction. By precisely estimating human poses, the robot can provide an appropriate response to the human. Most existing approaches use the optical flow, RNNs, or CNNs to extract temporal features from videos. Despite the positive results of these attempts, most of them only straightforwardly integrate features along the temporal dimension, ignoring temporal correlations between joints. In contrast to previous methods, we propose a plug-and-play kinematics modeling module (KMM) based on the domain-cross attention mechanism to model the temporal correlation between joints across different frames explicitly. Specifically, the proposed KMM models the temporal correlation between any two joints by calculating their temporal similarity. In this way, KMM can learn the motion cues of each joint. Using the motion cues (temporal domain) and historical positions of joints (spatial domain), KMM can infer the initial positions of joints in the current frame in advance. In addition, we present a kinematics modeling network (KIMNet) based on the KMM for obtaining the final positions of joints by combining pose features and initial positions of joints. By explicitly modeling temporal correlations between joints, KIMNet can infer the occluded joints at present according to all joints at the previous moment. Furthermore, the KMM is achieved through an attention mechanism, which allows it to maintain the high resolution of features. Therefore, it can transfer rich historical pose information to the current frame, which provides effective pose information for locating occluded joints. Our approach achieves state-of-the-art results on two standard video-based pose estimation benchmarks. Moreover, the proposed KIMNet shows some robustness to the occlusion, demonstrating the effectiveness of the proposed method

arXiv.org e-Print Archive

Physics-constrained Attack against Convolution-based Human Motion Prediction

Author: Dang Yonghao
Duan Chengxu
Liu Xiaoli
Yin Jianqin
Zhang Zhicheng
Publication venue
Publication date: 14/01/2024
Field of study

Human motion prediction has achieved a brilliant performance with the help of convolution-based neural networks. However, currently, there is no work evaluating the potential risk in human motion prediction when facing adversarial attacks. The adversarial attack will encounter problems against human motion prediction in naturalness and data scale. To solve the problems above, we propose a new adversarial attack method that generates the worst-case perturbation by maximizing the human motion predictor's prediction error with physical constraints. Specifically, we introduce a novel adaptable scheme that facilitates the attack to suit the scale of the target pose and two physical constraints to enhance the naturalness of the adversarial example. The evaluating experiments on three datasets show that the prediction errors of all target models are enlarged significantly, which means current convolution-based human motion prediction models are vulnerable to the proposed attack. Based on the experimental results, we provide insights on how to enhance the adversarial robustness of the human motion predictor and how to improve the adversarial attack against human motion prediction

arXiv.org e-Print Archive

Fabrication of eco-friendly carbon microtubes @ nitrogen-doped reduced graphene oxide hybrid as an excellent carbonaceous scaffold to load MnO2 nanowall (PANI nanorod) as bifunctional material for high-performance supercapacitor and oxygen reduction reaction catalyst

Author: Bingbing Li
Chao Duan
Chen
Cheng
Chuanyin Xiong
ElKady
Goswami
Hong
Hummers
Jayawickramage
Jie Su
Le
Lei Dai
Li
Liu
Ma
Ma
Mengrui Li
Pachfule
Pu
Qi Yang
Qiu
Qu
Ramanathan
Sekar
Su
Wei Zhao
Weihua Dang
Wu
Xiong
Xiong
Xiong
Xiong
Xiong
Xu
Xu
Yan
Yang
Yang
Yang
Ye
Yonghao Ni
Yongjian Xu
Yu
Yue Liu
Zhang
Zhang
Zhao
Zhao
Zhao
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Super-ductile, injectable, fast self-healing collagen-based hydrogels with multi-responsive and accelerated wound-repair properties

Author: Algi
Bilici
Branco da Cunha
Chen
Chen
Chen
Chen
Coussot
Cuicui Ding
Dang
Ding
Ding
Ding
Doi
Ekici
Eswaraiah
Feng Deng
Gao
Gómez-Guillén
Haugan
He
Hui Wu
Kołodziejska
Lee
Lele Tang
Li
Li
Lihui Chen
Lior
Liu
Liu
Liu
Liu
Liu
Liulian Huang
Lü
Mei
Min Zhang
Ogawa
Pan
Plunkett
Qiao
Ravichandran
Reyna-Urrutia
Sengupta
Shan Lin
Shi
Shin
Sun
Sun
Tian
Tran
Tu
Wang
Wang
Wei
Wu
Xiaoqing Hu
Yang
Yang
Yonghao Ni
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref