48 research outputs found
Relation-Based Associative Joint Location for Human Pose Estimation in Videos
Video-based human pose estimation (HPE) is a vital yet challenging task.
While deep learning methods have made significant progress for the HPE, most
approaches to this task detect each joint independently, damaging the pose
structural information. In this paper, unlike the prior methods, we propose a
Relation-based Pose Semantics Transfer Network (RPSTN) to locate joints
associatively. Specifically, we design a lightweight joint relation extractor
(JRE) to model the pose structural features and associatively generate heatmaps
for joints by modeling the relation between any two joints heuristically
instead of building each joint heatmap independently. Actually, the proposed
JRE module models the spatial configuration of human poses through the
relationship between any two joints. Moreover, considering the temporal
semantic continuity of videos, the pose semantic information in the current
frame is beneficial for guiding the location of joints in the next frame.
Therefore, we use the idea of knowledge reuse to propagate the pose semantic
information between consecutive frames. In this way, the proposed RPSTN
captures temporal dynamics of poses. On the one hand, the JRE module can infer
invisible joints according to the relationship between the invisible joints and
other visible joints in space. On the other hand, in the time, the propose
model can transfer the pose semantic features from the non-occluded frame to
the occluded frame to locate occluded joints. Therefore, our method is robust
to the occlusion and achieves state-of-the-art results on the two challenging
datasets, which demonstrates its effectiveness for video-based human pose
estimation. We will release the code and models publicly
Spatial-Temporal Decoupling Contrastive Learning for Skeleton-based Human Action Recognition
Skeleton-based action recognition is a central task in human-computer
interaction. However, most previous methods suffer from two issues: (i)
semantic ambiguity arising from spatial-temporal information mixture; and (ii)
overlooking the explicit exploitation of the latent data distributions (i.e.,
the intra-class variations and inter-class relations), thereby leading to
sub-optimum solutions of the skeleton encoders. To mitigate this, we propose a
spatial-temporal decoupling contrastive learning (STD-CL) framework to obtain
discriminative and semantically distinct representations from the sequences,
which can be incorporated into various previous skeleton encoders and can be
removed when testing. Specifically, we decouple the global features into
spatial-specific and temporal-specific features to reduce the spatial-temporal
coupling of features. Furthermore, to explicitly exploit the latent data
distributions, we employ the attentive features to contrastive learning, which
models the cross-sequence semantic relations by pulling together the features
from the positive pairs and pushing away the negative pairs. Extensive
experiments show that STD-CL with four various skeleton encoders (HCN, 2S-AGCN,
CTR-GCN, and Hyperformer) achieves solid improvements on NTU60, NTU120, and
NW-UCLA benchmarks. The code will be released soon
Topology-aware MLP for Skeleton-based Action Recognition
Graph convolution networks (GCNs) have achieved remarkable performance in
skeleton-based action recognition. However, existing previous GCN-based methods
have relied excessively on elaborate human body priors and constructed complex
feature aggregation mechanisms, which limits the generalizability of networks.
To solve these problems, we propose a novel Spatial Topology Gating Unit
(STGU), which is an MLP-based variant without extra priors, to capture the
co-occurrence topology features that encode the spatial dependency across all
joints. In STGU, to model the sample-specific and completely independent
point-wise topology attention, a new gate-based feature interaction mechanism
is introduced to activate the features point-to-point by the attention map
generated from the input. Based on the STGU, in this work, we propose the first
topology-aware MLP-based model, Ta-MLP, for skeleton-based action recognition.
In comparison with existing previous methods on three large-scale datasets,
Ta-MLP achieves competitive performance. In addition, Ta-MLP reduces the
parameters by up to 62.5% with favorable results. Compared with previous
state-of-the-art (SOAT) approaches, Ta-MLP pushes the frontier of real-time
action recognition. The code will be available at
https://github.com/BUPTSJZhang/Ta-MLP.Comment: 10 pages, 9 figure
BiHRNet: A Binary high-resolution network for Human Pose Estimation
Human Pose Estimation (HPE) plays a crucial role in computer vision
applications. However, it is difficult to deploy state-of-the-art models on
resouce-limited devices due to the high computational costs of the networks. In
this work, a binary human pose estimator named BiHRNet(Binary HRNet) is
proposed, whose weights and activations are expressed as 1. BiHRNet
retains the keypoint extraction ability of HRNet, while using fewer computing
resources by adapting binary neural network (BNN). In order to reduce the
accuracy drop caused by network binarization, two categories of techniques are
proposed in this work. For optimizing the training process for binary pose
estimator, we propose a new loss function combining KL divergence loss with
AWing loss, which makes the binary network obtain more comprehensive output
distribution from its real-valued counterpart to reduce information loss caused
by binarization. For designing more binarization-friendly structures, we
propose a new information reconstruction bottleneck called IR Bottleneck to
retain more information in the initial stage of the network. In addition, we
also propose a multi-scale basic block called MS-Block for information
retention. Our work has less computation cost with few precision drop.
Experimental results demonstrate that BiHRNet achieves a PCKh of 87.9 on the
MPII dataset, which outperforms all binary pose estimation networks. On the
challenging of COCO dataset, the proposed method enables the binary neural
network to achieve 70.8 mAP, which is better than most tested lightweight
full-precision networks.Comment: 12 pages, 6 figure
Learning Human Kinematics by Modeling Temporal Correlations between Joints for Video-based Human Pose Estimation
Estimating human poses from videos is critical in human-computer interaction.
By precisely estimating human poses, the robot can provide an appropriate
response to the human. Most existing approaches use the optical flow, RNNs, or
CNNs to extract temporal features from videos. Despite the positive results of
these attempts, most of them only straightforwardly integrate features along
the temporal dimension, ignoring temporal correlations between joints. In
contrast to previous methods, we propose a plug-and-play kinematics modeling
module (KMM) based on the domain-cross attention mechanism to model the
temporal correlation between joints across different frames explicitly.
Specifically, the proposed KMM models the temporal correlation between any two
joints by calculating their temporal similarity. In this way, KMM can learn the
motion cues of each joint. Using the motion cues (temporal domain) and
historical positions of joints (spatial domain), KMM can infer the initial
positions of joints in the current frame in advance. In addition, we present a
kinematics modeling network (KIMNet) based on the KMM for obtaining the final
positions of joints by combining pose features and initial positions of joints.
By explicitly modeling temporal correlations between joints, KIMNet can infer
the occluded joints at present according to all joints at the previous moment.
Furthermore, the KMM is achieved through an attention mechanism, which allows
it to maintain the high resolution of features. Therefore, it can transfer rich
historical pose information to the current frame, which provides effective pose
information for locating occluded joints. Our approach achieves
state-of-the-art results on two standard video-based pose estimation
benchmarks. Moreover, the proposed KIMNet shows some robustness to the
occlusion, demonstrating the effectiveness of the proposed method
Learning from Easy to Complex: Adaptive Multi-curricula Learning for Neural Dialogue Generation
Current state-of-the-art neural dialogue systems are mainly data-driven and
are trained on human-generated responses. However, due to the subjectivity and
open-ended nature of human conversations, the complexity of training dialogues
varies greatly. The noise and uneven complexity of query-response pairs impede
the learning efficiency and effects of the neural dialogue generation models.
What is more, so far, there are no unified dialogue complexity measurements,
and the dialogue complexity embodies multiple aspects of
attributes---specificity, repetitiveness, relevance, etc. Inspired by human
behaviors of learning to converse, where children learn from easy dialogues to
complex ones and dynamically adjust their learning progress, in this paper, we
first analyze five dialogue attributes to measure the dialogue complexity in
multiple perspectives on three publicly available corpora. Then, we propose an
adaptive multi-curricula learning framework to schedule a committee of the
organized curricula. The framework is established upon the reinforcement
learning paradigm, which automatically chooses different curricula at the
evolving learning process according to the learning status of the neural
dialogue generation model. Extensive experiments conducted on five
state-of-the-art models demonstrate its learning efficiency and effectiveness
with respect to 13 automatic evaluation metrics and human judgments.Comment: Accepted to AAAI 202
Smart grid power load type forecasting: research on optimization methods of deep learning models
Introduction: In the field of power systems, power load type prediction is a crucial task. Different types of loads, such as domestic, industrial, commercial, etc., have different energy consumption patterns. Therefore, accurate prediction of load types can help the power system better plan power supply strategies to improve energy utilization and stability. However, this task faces multiple challenges, including the complex topology of the power system, the diversity of time series data, and the correlation between data. With the rapid development of deep learning methods, researchers are beginning to leverage these powerful techniques to address this challenge. This study aims to explore how to optimize deep learning models to improve the accuracy of load type prediction and provide support for efficient energy management and optimization of smart grids.Methods: In this study, we propose a deep learning method that combines graph convolutional networks (GCN) and sequence-to-sequence (Seq2Seq) models and introduces an attention mechanism. The methodology involves multiple steps: first, we use the GCN encoder to process the topological structure information of the power system and encode node features into a graph data representation. Next, the Seq2Seq decoder takes the historical time series data as the input sequence and generates a prediction sequence of the load type. We then introduced an attention mechanism, which allows the model to dynamically adjust its attention to input data and better capture the relationship between time series data and graph data.Results: We conducted extensive experimental validation on four different datasets, including the National Grid Electricity Load Dataset, the Canadian Electricity Load Dataset, the United States Electricity Load Dataset, and the International Electricity Load Dataset. Experimental results show that our method achieves significant improvements in load type prediction tasks. It exhibits higher accuracy and robustness compared to traditional methods and single deep learning models. Our approach demonstrates advantages in improving load type prediction accuracy, providing strong support for the future development of the power system.Discussion: The results of our study highlight the potential of deep learning techniques, specifically the combination of GCN and Seq2Seq models with attention mechanisms, in addressing the challenges of load type prediction in power systems. By improving prediction accuracy and robustness, our approach can contribute to more efficient energy management and the optimization of smart grids
Emulating power spectra for pre- and post-reconstructed galaxy samples
The small-scale linear information in galaxy samples typically lost during
non-linear growth can be restored to a certain level by the density field
reconstruction, which has been demonstrated for improving the precision of the
baryon acoustic oscillations (BAO) measurements. As proposed in the literature,
a joint analysis of the power spectrum before and after the reconstruction
enables an efficient extraction of information carried by high-order
statistics. However, the statistics of the post-reconstruction density field
are difficult to model. In this work, we circumvent this issue by developing an
accurate emulator for the pre-reconstructed, post-reconstructed, and cross
power spectra (, , ) up to
based on the \textsc{Dark Quest} N-body simulations.
The accuracy of the emulator is at percent level, namely, the error of the
emulated monopole and quadrupole of the power spectra is less than and
of the ground truth, respectively. A fit to an example power spectra
using the emulator shows that the constraints on cosmological parameters get
largely improved using ++ with
, compared to that derived from alone, namely, the constraints on (, , ) are
tightened by , and the uncertainties of the derived BAO and RSD
parameters (, , ) shrink by , respectively. This highlights the complementarity among , and , which demonstrates the efficiency
and practicability of a joint , and
analysis for cosmological implications.Comment: 15 pages, 8 figures, 2 table