20 research outputs found
Combining the Silhouette and Skeleton Data for Gait Recognition
Gait recognition, a promising long-distance biometric technology, has aroused
intense interest in computer vision. Existing works on gait recognition can be
divided into appearance-based methods and model-based methods, which extract
features from silhouettes and skeleton data, respectively. However, since
appearance-based methods are greatly affected by clothing changing and carrying
condition, and model-based methods are limited by the accuracy of pose
estimation approaches, gait recognition remains challenging in practical
applications. In order to integrate the advantages of such two approaches, a
two-branch neural network (NN) is proposed in this paper. Our method contains
two branches, namely a CNN-based branch taking silhouettes as input and a
GCN-based branch taking skeletons as input. In addition, two new modules are
proposed in the GCN-based branch for better gait representation. First, we
present a simple yet effective fully connected graph convolution operator to
integrate the multi-scale graph convolutions and alleviate the dependence on
natural human joint connections. Second, we deploy a multi-dimension attention
module named STC-Att to learn spatial, temporal and channel-wise attention
simultaneously. We evaluated the proposed two-branch neural network on the
CASIA-B dataset. The experimental results show that our method achieves
state-of-the-art performance in various conditions.Comment: The paper is under consideration at Computer Vision and Image
Understandin
Condition-Adaptive Graph Convolution Learning for Skeleton-Based Gait Recognition
Graph convolutional networks have been widely applied in skeleton-based gait
recognition. A key challenge in this task is to distinguish the individual
walking styles of different subjects across various views. Existing
state-of-the-art methods employ uniform convolutions to extract features from
diverse sequences and ignore the effects of viewpoint changes. To overcome
these limitations, we propose a condition-adaptive graph (CAG) convolution
network that can dynamically adapt to the specific attributes of each skeleton
sequence and the corresponding view angle. In contrast to using fixed weights
for all joints and sequences, we introduce a joint-specific filter learning
(JSFL) module in the CAG method, which produces sequence-adaptive filters at
the joint level. The adaptive filters capture fine-grained patterns that are
unique to each joint, enabling the extraction of diverse spatial-temporal
information about body parts. Additionally, we design a view-adaptive topology
learning (VATL) module that generates adaptive graph topologies. These graph
topologies are used to correlate the joints adaptively according to the
specific view conditions. Thus, CAG can simultaneously adjust to various
walking styles and viewpoints. Experiments on the two most widely used datasets
(i.e., CASIA-B and OU-MVLP) show that CAG surpasses all previous skeleton-based
methods. Moreover, the recognition performance can be enhanced by simply
combining CAG with appearance-based methods, demonstrating the ability of CAG
to provide useful complementary information.The source code will be available
at https://github.com/OliverHxh/CAG.Comment: Accepted by TIP journa
Reducing Training Demands for 3D Gait Recognition with Deep Koopman Operator Constraints
Deep learning research has made many biometric recognition solution viable,
but it requires vast training data to achieve real-world generalization. Unlike
other biometric traits, such as face and ear, gait samples cannot be easily
crawled from the web to form massive unconstrained datasets. As the human body
has been extensively studied for different digital applications, one can rely
on prior shape knowledge to overcome data scarcity. This work follows the
recent trend of fitting a 3D deformable body model into gait videos using deep
neural networks to obtain disentangled shape and pose representations for each
frame. To enforce temporal consistency in the network, we introduce a new
Linear Dynamical Systems (LDS) module and loss based on Koopman operator
theory, which provides an unsupervised motion regularization for the periodic
nature of gait, as well as a predictive capacity for extending gait sequences.
We compare LDS to the traditional adversarial training approach and use the USF
HumanID and CASIA-B datasets to show that LDS can obtain better accuracy with
less training data. Finally, we also show that our 3D modeling approach is much
better than other 3D gait approaches in overcoming viewpoint variation under
normal, bag-carrying and clothing change conditions
Multimodal Human Pose Feature Fusion for Gait Recognition.
Gait recognition allows identifying people at a distance based on the way they walk (i.e. gait) in a non-invasive approach. Most of the approaches published in the last decades are dominated by the use of silhouettes or other appearance-based modalities to describe the Gait cycle. In an attempt to exclude the appearance data, many works have been published that address the use of the human pose as a modality to describe the walking movement. However, as the pose contains less information when used as a single modality, the performance achieved by the models is generally poorer. To overcome such limitations, we propose a multimodal setup that combines multiple pose representation models. To this end, we evaluate multiple fusion strategies to aggregate the features derived from each pose modality at every model stage. Moreover, we introduce a weighted sum with trainable weights that can adaptively learn the optimal balance among pose modalities. Our experimental results show that (a) our fusion strategies can effectively combine different pose modalities by improving their baseline performance; and, (b) by using only human pose, our approach outperforms most of the silhouette-based state-of-the-art approaches. Concretely, we obtain 92.8% mean Top-1 accuracy in CASIA-B.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech
GAIT Technology for Human Recognition using CNN
Gait is a distinctive biometric characteristic that can be detected from a distance; as a result, it has several uses in social security, forensic identification, and crime prevention. Existing gait identification techniques use a gait template, which makes it difficult to keep temporal information, or a gait sequence, which maintains pointless sequential limitations and loses the ability to portray a gait. Our technique, which is based on this deep set viewpoint, is immune to frame permutations and can seamlessly combine frames from many videos that were taken in various contexts, such as diversified watching, angles, various outfits, or various situations for transporting something. According to experiments, our single-model strategy obtains an average rank-1 accuracy of 96.1% on the CASIA-B gait dataset and an accuracy of 87.9% on the OU-MVLP gait dataset when used under typical walking conditions. Our model also demonstrates a great degree of robustness under numerous challenging circumstances. When carrying bags and wearing a coat while walking, it obtains accuracy on the CASIA-B of 90.8% and 70.3%, respectively, greatly surpassing the best approach currently in use. Additionally, the suggested method achieves a satisfactory level of accuracy even when there are few frames available in the test samples; for instance, it achieves 85.0% on the CASIA-B even with only 7 frames
DyGait: Exploiting Dynamic Representations for High-performance Gait Recognition
Gait recognition is a biometric technology that recognizes the identity of
humans through their walking patterns. Compared with other biometric
technologies, gait recognition is more difficult to disguise and can be applied
to the condition of long-distance without the cooperation of subjects. Thus, it
has unique potential and wide application for crime prevention and social
security. At present, most gait recognition methods directly extract features
from the video frames to establish representations. However, these
architectures learn representations from different features equally but do not
pay enough attention to dynamic features, which refers to a representation of
dynamic parts of silhouettes over time (e.g. legs). Since dynamic parts of the
human body are more informative than other parts (e.g. bags) during walking, in
this paper, we propose a novel and high-performance framework named DyGait.
This is the first framework on gait recognition that is designed to focus on
the extraction of dynamic features. Specifically, to take full advantage of the
dynamic information, we propose a Dynamic Augmentation Module (DAM), which can
automatically establish spatial-temporal feature representations of the dynamic
parts of the human body. The experimental results show that our DyGait network
outperforms other state-of-the-art gait recognition methods. It achieves an
average Rank-1 accuracy of 71.4% on the GREW dataset, 66.3% on the Gait3D
dataset, 98.4% on the CASIA-B dataset and 98.3% on the OU-MVLP dataset
GPGait: Generalized Pose-based Gait Recognition
Recent works on pose-based gait recognition have demonstrated the potential
of using such simple information to achieve results comparable to
silhouette-based methods. However, the generalization ability of pose-based
methods on different datasets is undesirably inferior to that of
silhouette-based ones, which has received little attention but hinders the
application of these methods in real-world scenarios. To improve the
generalization ability of pose-based methods across datasets, we propose a
\textbf{G}eneralized \textbf{P}ose-based \textbf{Gait} recognition
(\textbf{GPGait}) framework. First, a Human-Oriented Transformation (HOT) and a
series of Human-Oriented Descriptors (HOD) are proposed to obtain a unified
pose representation with discriminative multi-features. Then, given the slight
variations in the unified representation after HOT and HOD, it becomes crucial
for the network to extract local-global relationships between the keypoints. To
this end, a Part-Aware Graph Convolutional Network (PAGCN) is proposed to
enable efficient graph partition and local-global spatial feature extraction.
Experiments on four public gait recognition datasets, CASIA-B, OUMVLP-Pose,
Gait3D and GREW, show that our model demonstrates better and more stable
cross-domain capabilities compared to existing skeleton-based methods,
achieving comparable recognition results to silhouette-based ones. Code is
available at https://github.com/BNU-IVC/FastPoseGait.Comment: ICCV Camera Read
GaitFormer: Revisiting Intrinsic Periodicity for Gait Recognition
Gait recognition aims to distinguish different walking patterns by analyzing
video-level human silhouettes, rather than relying on appearance information.
Previous research on gait recognition has primarily focused on extracting local
or global spatial-temporal representations, while overlooking the intrinsic
periodic features of gait sequences, which, when fully utilized, can
significantly enhance performance. In this work, we propose a plug-and-play
strategy, called Temporal Periodic Alignment (TPA), which leverages the
periodic nature and fine-grained temporal dependencies of gait patterns. The
TPA strategy comprises two key components. The first component is Adaptive
Fourier-transform Position Encoding (AFPE), which adaptively converts features
and discrete-time signals into embeddings that are sensitive to periodic
walking patterns. The second component is the Temporal Aggregation Module
(TAM), which separates embeddings into trend and seasonal components, and
extracts meaningful temporal correlations to identify primary components, while
filtering out random noise. We present a simple and effective baseline method
for gait recognition, based on the TPA strategy. Extensive experiments
conducted on three popular public datasets (CASIA-B, OU-MVLP, and GREW)
demonstrate that our proposed method achieves state-of-the-art performance on
multiple benchmark tests
Context-Sensitive Temporal Feature Learning for Gait Recognition
Although gait recognition has drawn increasing research attention recently,
it remains challenging to learn discriminative temporal representation, since
the silhouette differences are quite subtle in spatial domain. Inspired by the
observation that human can distinguish gaits of different subjects by
adaptively focusing on temporal clips with different time scales, we propose a
context-sensitive temporal feature learning (CSTL) network for gait
recognition. CSTL produces temporal features in three scales, and adaptively
aggregates them according to the contextual information from local and global
perspectives. Specifically, CSTL contains an adaptive temporal aggregation
module that subsequently performs local relation modeling and global relation
modeling to fuse the multi-scale features. Besides, in order to remedy the
spatial feature corruption caused by temporal operations, CSTL incorporates a
salient spatial feature learning (SSFL) module to select groups of
discriminative spatial features. Particularly, we utilize transformers to
implement the global relation modeling and the SSFL module. To the best of our
knowledge, this is the first work that adopts transformer in gait recognition.
Extensive experiments conducted on three datasets demonstrate the
state-of-the-art performance. Concretely, we achieve rank-1 accuracies of
98.7%, 96.2% and 88.7% under normal-walking, bag-carrying and coat-wearing
conditions on CASIA-B, 97.5% on OU-MVLP and 50.6% on GREW.Comment: Submitted to TPAM