48 research outputs found

    Relation-Based Associative Joint Location for Human Pose Estimation in Videos

    Full text link
    Video-based human pose estimation (HPE) is a vital yet challenging task. While deep learning methods have made significant progress for the HPE, most approaches to this task detect each joint independently, damaging the pose structural information. In this paper, unlike the prior methods, we propose a Relation-based Pose Semantics Transfer Network (RPSTN) to locate joints associatively. Specifically, we design a lightweight joint relation extractor (JRE) to model the pose structural features and associatively generate heatmaps for joints by modeling the relation between any two joints heuristically instead of building each joint heatmap independently. Actually, the proposed JRE module models the spatial configuration of human poses through the relationship between any two joints. Moreover, considering the temporal semantic continuity of videos, the pose semantic information in the current frame is beneficial for guiding the location of joints in the next frame. Therefore, we use the idea of knowledge reuse to propagate the pose semantic information between consecutive frames. In this way, the proposed RPSTN captures temporal dynamics of poses. On the one hand, the JRE module can infer invisible joints according to the relationship between the invisible joints and other visible joints in space. On the other hand, in the time, the propose model can transfer the pose semantic features from the non-occluded frame to the occluded frame to locate occluded joints. Therefore, our method is robust to the occlusion and achieves state-of-the-art results on the two challenging datasets, which demonstrates its effectiveness for video-based human pose estimation. We will release the code and models publicly

    Spatial-Temporal Decoupling Contrastive Learning for Skeleton-based Human Action Recognition

    Full text link
    Skeleton-based action recognition is a central task in human-computer interaction. However, most previous methods suffer from two issues: (i) semantic ambiguity arising from spatial-temporal information mixture; and (ii) overlooking the explicit exploitation of the latent data distributions (i.e., the intra-class variations and inter-class relations), thereby leading to sub-optimum solutions of the skeleton encoders. To mitigate this, we propose a spatial-temporal decoupling contrastive learning (STD-CL) framework to obtain discriminative and semantically distinct representations from the sequences, which can be incorporated into various previous skeleton encoders and can be removed when testing. Specifically, we decouple the global features into spatial-specific and temporal-specific features to reduce the spatial-temporal coupling of features. Furthermore, to explicitly exploit the latent data distributions, we employ the attentive features to contrastive learning, which models the cross-sequence semantic relations by pulling together the features from the positive pairs and pushing away the negative pairs. Extensive experiments show that STD-CL with four various skeleton encoders (HCN, 2S-AGCN, CTR-GCN, and Hyperformer) achieves solid improvements on NTU60, NTU120, and NW-UCLA benchmarks. The code will be released soon

    Topology-aware MLP for Skeleton-based Action Recognition

    Full text link
    Graph convolution networks (GCNs) have achieved remarkable performance in skeleton-based action recognition. However, existing previous GCN-based methods have relied excessively on elaborate human body priors and constructed complex feature aggregation mechanisms, which limits the generalizability of networks. To solve these problems, we propose a novel Spatial Topology Gating Unit (STGU), which is an MLP-based variant without extra priors, to capture the co-occurrence topology features that encode the spatial dependency across all joints. In STGU, to model the sample-specific and completely independent point-wise topology attention, a new gate-based feature interaction mechanism is introduced to activate the features point-to-point by the attention map generated from the input. Based on the STGU, in this work, we propose the first topology-aware MLP-based model, Ta-MLP, for skeleton-based action recognition. In comparison with existing previous methods on three large-scale datasets, Ta-MLP achieves competitive performance. In addition, Ta-MLP reduces the parameters by up to 62.5% with favorable results. Compared with previous state-of-the-art (SOAT) approaches, Ta-MLP pushes the frontier of real-time action recognition. The code will be available at https://github.com/BUPTSJZhang/Ta-MLP.Comment: 10 pages, 9 figure

    BiHRNet: A Binary high-resolution network for Human Pose Estimation

    Full text link
    Human Pose Estimation (HPE) plays a crucial role in computer vision applications. However, it is difficult to deploy state-of-the-art models on resouce-limited devices due to the high computational costs of the networks. In this work, a binary human pose estimator named BiHRNet(Binary HRNet) is proposed, whose weights and activations are expressed as ±\pm1. BiHRNet retains the keypoint extraction ability of HRNet, while using fewer computing resources by adapting binary neural network (BNN). In order to reduce the accuracy drop caused by network binarization, two categories of techniques are proposed in this work. For optimizing the training process for binary pose estimator, we propose a new loss function combining KL divergence loss with AWing loss, which makes the binary network obtain more comprehensive output distribution from its real-valued counterpart to reduce information loss caused by binarization. For designing more binarization-friendly structures, we propose a new information reconstruction bottleneck called IR Bottleneck to retain more information in the initial stage of the network. In addition, we also propose a multi-scale basic block called MS-Block for information retention. Our work has less computation cost with few precision drop. Experimental results demonstrate that BiHRNet achieves a PCKh of 87.9 on the MPII dataset, which outperforms all binary pose estimation networks. On the challenging of COCO dataset, the proposed method enables the binary neural network to achieve 70.8 mAP, which is better than most tested lightweight full-precision networks.Comment: 12 pages, 6 figure

    Learning Human Kinematics by Modeling Temporal Correlations between Joints for Video-based Human Pose Estimation

    Full text link
    Estimating human poses from videos is critical in human-computer interaction. By precisely estimating human poses, the robot can provide an appropriate response to the human. Most existing approaches use the optical flow, RNNs, or CNNs to extract temporal features from videos. Despite the positive results of these attempts, most of them only straightforwardly integrate features along the temporal dimension, ignoring temporal correlations between joints. In contrast to previous methods, we propose a plug-and-play kinematics modeling module (KMM) based on the domain-cross attention mechanism to model the temporal correlation between joints across different frames explicitly. Specifically, the proposed KMM models the temporal correlation between any two joints by calculating their temporal similarity. In this way, KMM can learn the motion cues of each joint. Using the motion cues (temporal domain) and historical positions of joints (spatial domain), KMM can infer the initial positions of joints in the current frame in advance. In addition, we present a kinematics modeling network (KIMNet) based on the KMM for obtaining the final positions of joints by combining pose features and initial positions of joints. By explicitly modeling temporal correlations between joints, KIMNet can infer the occluded joints at present according to all joints at the previous moment. Furthermore, the KMM is achieved through an attention mechanism, which allows it to maintain the high resolution of features. Therefore, it can transfer rich historical pose information to the current frame, which provides effective pose information for locating occluded joints. Our approach achieves state-of-the-art results on two standard video-based pose estimation benchmarks. Moreover, the proposed KIMNet shows some robustness to the occlusion, demonstrating the effectiveness of the proposed method

    Learning from Easy to Complex: Adaptive Multi-curricula Learning for Neural Dialogue Generation

    Full text link
    Current state-of-the-art neural dialogue systems are mainly data-driven and are trained on human-generated responses. However, due to the subjectivity and open-ended nature of human conversations, the complexity of training dialogues varies greatly. The noise and uneven complexity of query-response pairs impede the learning efficiency and effects of the neural dialogue generation models. What is more, so far, there are no unified dialogue complexity measurements, and the dialogue complexity embodies multiple aspects of attributes---specificity, repetitiveness, relevance, etc. Inspired by human behaviors of learning to converse, where children learn from easy dialogues to complex ones and dynamically adjust their learning progress, in this paper, we first analyze five dialogue attributes to measure the dialogue complexity in multiple perspectives on three publicly available corpora. Then, we propose an adaptive multi-curricula learning framework to schedule a committee of the organized curricula. The framework is established upon the reinforcement learning paradigm, which automatically chooses different curricula at the evolving learning process according to the learning status of the neural dialogue generation model. Extensive experiments conducted on five state-of-the-art models demonstrate its learning efficiency and effectiveness with respect to 13 automatic evaluation metrics and human judgments.Comment: Accepted to AAAI 202

    Smart grid power load type forecasting: research on optimization methods of deep learning models

    Get PDF
    Introduction: In the field of power systems, power load type prediction is a crucial task. Different types of loads, such as domestic, industrial, commercial, etc., have different energy consumption patterns. Therefore, accurate prediction of load types can help the power system better plan power supply strategies to improve energy utilization and stability. However, this task faces multiple challenges, including the complex topology of the power system, the diversity of time series data, and the correlation between data. With the rapid development of deep learning methods, researchers are beginning to leverage these powerful techniques to address this challenge. This study aims to explore how to optimize deep learning models to improve the accuracy of load type prediction and provide support for efficient energy management and optimization of smart grids.Methods: In this study, we propose a deep learning method that combines graph convolutional networks (GCN) and sequence-to-sequence (Seq2Seq) models and introduces an attention mechanism. The methodology involves multiple steps: first, we use the GCN encoder to process the topological structure information of the power system and encode node features into a graph data representation. Next, the Seq2Seq decoder takes the historical time series data as the input sequence and generates a prediction sequence of the load type. We then introduced an attention mechanism, which allows the model to dynamically adjust its attention to input data and better capture the relationship between time series data and graph data.Results: We conducted extensive experimental validation on four different datasets, including the National Grid Electricity Load Dataset, the Canadian Electricity Load Dataset, the United States Electricity Load Dataset, and the International Electricity Load Dataset. Experimental results show that our method achieves significant improvements in load type prediction tasks. It exhibits higher accuracy and robustness compared to traditional methods and single deep learning models. Our approach demonstrates advantages in improving load type prediction accuracy, providing strong support for the future development of the power system.Discussion: The results of our study highlight the potential of deep learning techniques, specifically the combination of GCN and Seq2Seq models with attention mechanisms, in addressing the challenges of load type prediction in power systems. By improving prediction accuracy and robustness, our approach can contribute to more efficient energy management and the optimization of smart grids

    Emulating power spectra for pre- and post-reconstructed galaxy samples

    Full text link
    The small-scale linear information in galaxy samples typically lost during non-linear growth can be restored to a certain level by the density field reconstruction, which has been demonstrated for improving the precision of the baryon acoustic oscillations (BAO) measurements. As proposed in the literature, a joint analysis of the power spectrum before and after the reconstruction enables an efficient extraction of information carried by high-order statistics. However, the statistics of the post-reconstruction density field are difficult to model. In this work, we circumvent this issue by developing an accurate emulator for the pre-reconstructed, post-reconstructed, and cross power spectra (PpreP_{\rm pre}, PpostP_{\rm post}, PcrossP_{\rm cross}) up to k=0.5 h Mpc−1k=0.5~h~{\rm Mpc^{-1}} based on the \textsc{Dark Quest} N-body simulations. The accuracy of the emulator is at percent level, namely, the error of the emulated monopole and quadrupole of the power spectra is less than 1%1\% and 5%5\% of the ground truth, respectively. A fit to an example power spectra using the emulator shows that the constraints on cosmological parameters get largely improved using PpreP_{\rm pre}+PpostP_{\rm post}+PcrossP_{\rm cross} with kmax=0.25 h Mpc−1k_{\rm max}=0.25~h~{\rm Mpc^{-1}}, compared to that derived from PpreP_{\rm pre} alone, namely, the constraints on (Ωm\Omega_m, H0H_0, σ8\sigma_8) are tightened by ∼41%−55%\sim41 \%-55\%, and the uncertainties of the derived BAO and RSD parameters (α⊥\alpha_{\perp}, α∣∣\alpha_{||}, fσ8f\sigma_8) shrink by ∼28%−54%\sim 28\%-54\%, respectively. This highlights the complementarity among PpreP_{\rm pre}, PpostP_{\rm post} and PcrossP_{\rm cross}, which demonstrates the efficiency and practicability of a joint PpreP_{\rm pre}, PpostP_{\rm post} and PcrossP_{\rm cross} analysis for cosmological implications.Comment: 15 pages, 8 figures, 2 table
    corecore