605 research outputs found
Recommended from our members
A Review of Techniques on Gait-Based Person Re-Identification
Copyright (c) 2023 Babak Rahi, Maozhen Li and Man Qi. Person re-identification at a distance across multiple non-overlapping cameras has been an active research area for years. In the past ten years, short-term Person re-identification techniques have made great strides in accuracy using only appearance features in limited environments. However, massive intra-class variations and inter-class confusion limit their ability to be used in practical applications. Moreover, appearance consistency can only be assumed in a short time span from one camera to the other. Since the holistic appearance will change drastically over days and weeks, the technique, as mentioned above, will be ineffective. Practical applications usually require a long-term solution in which the subject's appearance and clothing might have changed after the elapse of a significant period. Facing these problems, soft biometric features such as Gait has stirred much interest in the past years. Nevertheless, even Gait can vary with illness, ageing and emotional states, walking surfaces, shoe types, clothes types, carried objects (by the subject) and even environment clutters. Therefore, Gait is considered as a temporal cue that could provide biometric motion information. On the other hand, the shape of the human body could be viewed as a spatial signal which can produce valuable information. So extracting discriminative features from both spatial and temporal domains would benefit this research. This article examines the main approaches used in gait analysis for re-identification over the past decade. We identify several relevant dimensions of the problem and provide a taxonomic analysis of current research. We conclude by reviewing the performance levels achievable with current technology and providing a perspective on the most challenging and promising research directions.This research received no external funding
MetaGait: Learning to Learn an Omni Sample Adaptive Representation for Gait Recognition
Gait recognition, which aims at identifying individuals by their walking
patterns, has recently drawn increasing research attention. However, gait
recognition still suffers from the conflicts between the limited binary visual
clues of the silhouette and numerous covariates with diverse scales, which
brings challenges to the model's adaptiveness. In this paper, we address this
conflict by developing a novel MetaGait that learns to learn an omni sample
adaptive representation. Towards this goal, MetaGait injects meta-knowledge,
which could guide the model to perceive sample-specific properties, into the
calibration network of the attention mechanism to improve the adaptiveness from
the omni-scale, omni-dimension, and omni-process perspectives. Specifically, we
leverage the meta-knowledge across the entire process, where Meta Triple
Attention and Meta Temporal Pooling are presented respectively to adaptively
capture omni-scale dependency from spatial/channel/temporal dimensions
simultaneously and to adaptively aggregate temporal information through
integrating the merits of three complementary temporal aggregation methods.
Extensive experiments demonstrate the state-of-the-art performance of the
proposed MetaGait. On CASIA-B, we achieve rank-1 accuracy of 98.7%, 96.0%, and
89.3% under three conditions, respectively. On OU-MVLP, we achieve rank-1
accuracy of 92.4%.Comment: Accepted by ECCV202
GaitStrip: Gait Recognition via Effective Strip-based Feature Representations and Multi-Level Framework
Many gait recognition methods first partition the human gait into N-parts and
then combine them to establish part-based feature representations. Their gait
recognition performance is often affected by partitioning strategies, which are
empirically chosen in different datasets. However, we observe that strips as
the basic component of parts are agnostic against different partitioning
strategies. Motivated by this observation, we present a strip-based multi-level
gait recognition network, named GaitStrip, to extract comprehensive gait
information at different levels. To be specific, our high-level branch explores
the context of gait sequences and our low-level one focuses on detailed posture
changes. We introduce a novel StriP-Based feature extractor (SPB) to learn the
strip-based feature representations by directly taking each strip of the human
body as the basic unit. Moreover, we propose a novel multi-branch structure,
called Enhanced Convolution Module (ECM), to extract different representations
of gaits. ECM consists of the Spatial-Temporal feature extractor (ST), the
Frame-Level feature extractor (FL) and SPB, and has two obvious advantages:
First, each branch focuses on a specific representation, which can be used to
improve the robustness of the network. Specifically, ST aims to extract
spatial-temporal features of gait sequences, while FL is used to generate the
feature representation of each frame. Second, the parameters of the ECM can be
reduced in test by introducing a structural re-parameterization technique.
Extensive experimental results demonstrate that our GaitStrip achieves
state-of-the-art performance in both normal walking and complex conditions.Comment: Accepted to ACCV202
Person recognition based on deep gait: a survey.
Gait recognition, also known as walking pattern recognition, has expressed deep interest in the computer vision and biometrics community due to its potential to identify individuals from a distance. It has attracted increasing attention due to its potential applications and non-invasive nature. Since 2014, deep learning approaches have shown promising results in gait recognition by automatically extracting features. However, recognizing gait accurately is challenging due to the covariate factors, complexity and variability of environments, and human body representations. This paper provides a comprehensive overview of the advancements made in this field along with the challenges and limitations associated with deep learning methods. For that, it initially examines the various gait datasets used in the literature review and analyzes the performance of state-of-the-art techniques. After that, a taxonomy of deep learning methods is presented to characterize and organize the research landscape in this field. Furthermore, the taxonomy highlights the basic limitations of deep learning methods in the context of gait recognition. The paper is concluded by focusing on the present challenges and suggesting several research directions to improve the performance of gait recognition in the future
Recommended from our members
View-invariant gait person re-identification with spatial and temporal attention
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonPerson re-identification at a distance across multiple none overlapping cameras has
been an active research area for years. In the past ten years, Short term Person Re-Id
techniques have made great strides in terms of accuracy using only appearance features
in limited environments. However, massive intraclass variations and inter-class
confusion limit their ability to be used in practical applications. Moreover, appearance
consistency can only be assumed in a short time span from one camera to the other.
Since the holistic appearance will change drastically over days and weeks, the technique,
as mentioned above, will be ineffective. Practical applications usually require a
long-term solution in which the subject appearance and clothing might have changed
after a significant period has elapsed. Facing these problems, soft biometric features
such as Gait have been proposed in the past. Nevertheless, even Gait can vary with
illness, ageing and changes in the emotional state, changes in walking surfaces, shoe
type, clothes type, objects carried by the subject and even clutter in the scene. Therefore,
Gait is considered a temporal cue that could provide biometric motion information.
On the other hand, the shape of the human body could be viewed as a spatial signal
which can produce valuable information. So, extracting discriminative features from
both spatial and temporal domains would be very beneficial to this research. Therefore,
this thesis focuses on finding the best and most robust method to tackle the gait human Re-identification problem and solve it for practical applications. In real-world
surveillance scenarios, the human gait cycle is primarily abnormal. These abnormalities
include but not limited to temporal and spatial characteristics changes such as
walking speed, broken gait phase and most importantly, varied camera angles. Our
work performed an extensive literature study on spatial and temporal gait feature extraction
methods with a focus on deep learning. Next, we conducted a comparative
study and proposed a spatial-temporal approach for gait feature extraction using the
fusion of multiple modalities, including optical-flow, raw silhouettes and RGB images.
This approach was tested on two of the most challenging publicly available datasets for
gait recognition TUM-GAID and CASIA-B, with excellent results presented in chapter
3.
Furthermore, a modern spatial-temporal attention mechanism was proposed and
tested on CASIA-B and OULP datasets which learns salient features independent of
the gait cycle and view variations. The spatial attention layer in the proposed method
extracts the spatial feature maps using a two-layered architecture that are fused using
late fusion. It can pay attention to the identity-related salient regions in silhouette sequences
discriminatively using the spatial feature maps. The temporal attention layer
consists of an LSTM that encodes the temporal motion for silhouette sequences. It
uses the encoded output vectors in the temporal attention architecture to focus on the
most critical timesteps in the gait cycle and discard the rest. Furthermore, we improved
the performance of our method by mapping our extracted spatial-temporal gait
features to a discriminative null space for use in our Siamese architecture for crossmatching.
We also conducted an element removal experiment on each segment of our
spatial-temporal attentional network to gain insight into each component’s contribution to the performance. Our method showed outstanding robustness against abnormal
gait cycles as well as viewpoint variations on both benchmark datasets
Gait recognition for person re-identification
Person re-identification across multiple cameras is an essential task in computer vision applications, particularly tracking the same person in different scenes. Gait recognition, which is the recognition based on the walking style, is mostly used for this purpose due to that human gait has unique characteristics that allow recognizing a person from a distance. However, human recognition via gait technique could be limited with the position of captured images or videos. Hence, this paper proposes a gait recognition approach for person re-identification. The proposed approach starts with estimating the angle of the gait first, and this is then followed with the recognition process, which is performed using convolutional neural networks. Herein, multitask convolutional neural network models and extracted gait energy images (GEIs) are used to estimate the angle and recognize the gait. GEIs are extracted by first detecting the moving objects, using background subtraction techniques. Training and testing phases are applied to the following three recognized datasets: CASIA-(B), OU-ISIR, and OU-MVLP. The proposed method is evaluated for background modeling using the Scene Background Modeling and Initialization (SBI) dataset. The proposed gait recognition method showed an accuracy of more than 98% for almost all datasets. Results of the proposed approach showed higher accuracy compared to obtained results of other methods result for CASIA-(B) and OU-MVLP and form the best results for the OU-ISIR dataset
Soft Biometric Analysis: MultiPerson and RealTime Pedestrian Attribute Recognition in Crowded Urban Environments
Traditionally, recognition systems were only based on human hard biometrics. However,
the ubiquitous CCTV cameras have raised the desire to analyze human biometrics from
far distances, without people attendance in the acquisition process. Highresolution
face closeshots
are rarely available at far distances such that facebased
systems cannot
provide reliable results in surveillance applications. Human soft biometrics such as body
and clothing attributes are believed to be more effective in analyzing human data collected
by security cameras.
This thesis contributes to the human soft biometric analysis in uncontrolled environments
and mainly focuses on two tasks: Pedestrian Attribute Recognition (PAR) and person reidentification
(reid).
We first review the literature of both tasks and highlight the history
of advancements, recent developments, and the existing benchmarks. PAR and person reid
difficulties are due to significant distances between intraclass
samples, which originate
from variations in several factors such as body pose, illumination, background, occlusion,
and data resolution. Recent stateoftheart
approaches present endtoend
models that
can extract discriminative and comprehensive feature representations from people. The
correlation between different regions of the body and dealing with limited learning data
is also the objective of many recent works. Moreover, class imbalance and correlation
between human attributes are specific challenges associated with the PAR problem.
We collect a large surveillance dataset to train a novel gender recognition model suitable
for uncontrolled environments. We propose a deep residual network that extracts several
posewise
patches from samples and obtains a comprehensive feature representation. In
the next step, we develop a model for multiple attribute recognition at once. Considering
the correlation between human semantic attributes and class imbalance, we respectively
use a multitask
model and a weighted loss function. We also propose a multiplication
layer on top of the backbone features extraction layers to exclude the background features
from the final representation of samples and draw the attention of the model to the
foreground area.
We address the problem of person reid
by implicitly defining the receptive fields of
deep learning classification frameworks. The receptive fields of deep learning models
determine the most significant regions of the input data for providing correct decisions.
Therefore, we synthesize a set of learning data in which the destructive regions (e.g.,
background) in each pair of instances are interchanged. A segmentation module
determines destructive and useful regions in each sample, and the label of synthesized
instances are inherited from the sample that shared the useful regions in the synthesized
image. The synthesized learning data are then used in the learning phase and help
the model rapidly learn that the identity and background regions are not correlated.
Meanwhile, the proposed solution could be seen as a data augmentation approach that
fully preserves the label information and is compatible with other data augmentation
techniques.
When reid
methods are learned in scenarios where the target person appears with identical garments in the gallery, the visual appearance of clothes is given the most
importance in the final feature representation. Clothbased
representations are not
reliable in the longterm
reid
settings as people may change their clothes. Therefore,
developing solutions that ignore clothing cues and focus on identityrelevant
features are
in demand. We transform the original data such that the identityrelevant
information of
people (e.g., face and body shape) are removed, while the identityunrelated
cues (i.e.,
color and texture of clothes) remain unchanged. A learned model on the synthesized
dataset predicts the identityunrelated
cues (shortterm
features). Therefore, we train a
second model coupled with the first model and learns the embeddings of the original data
such that the similarity between the embeddings of the original and synthesized data is
minimized. This way, the second model predicts based on the identityrelated
(longterm)
representation of people.
To evaluate the performance of the proposed models, we use PAR and person reid
datasets, namely BIODI, PETA, RAP, Market1501,
MSMTV2,
PRCC, LTCC, and MIT
and compared our experimental results with stateoftheart
methods in the field.
In conclusion, the data collected from surveillance cameras have low resolution, such
that the extraction of hard biometric features is not possible, and facebased
approaches
produce poor results. In contrast, soft biometrics are robust to variations in data quality.
So, we propose approaches both for PAR and person reid
to learn discriminative features
from each instance and evaluate our proposed solutions on several publicly available
benchmarks.This thesis was prepared at the University of Beria Interior, IT Instituto de Telecomunicações, Soft Computing and Image Analysis Laboratory (SOCIA Lab), Covilhã Delegation, and was submitted to the University of Beira Interior for defense in a public examination session
Human Gait Analysis using Spatiotemporal Data Obtained from Gait Videos
Mit der Entwicklung von Deep-Learning-Techniken sind Deep-acNN-basierte Methoden
zum Standard für Bildverarbeitungsaufgaben geworden, wie z. B. die Verfolgung menschlicher
Bewegungen und Posenschätzung, die Erkennung menschlicher Aktivitäten und
die Erkennung von Gesichtern. Deep-Learning-Techniken haben den Entwurf, die Implementierung
und den Einsatz komplexer und vielfältiger Anwendungen verbessert, die nun
in einer Vielzahl von Bereichen, einschließlich der Biomedizintechnik, eingesetzt werden.
Die Anwendung von Computer-Vision-Techniken auf die medizinische Bild- und Videoanalyse
hat zu bemerkenswerten Ergebnissen bei der Erkennung von Ereignissen geführt. Die
eingebaute Fähigkeit von convolutional neural network (CNN), Merkmale aus komplexen
medizinischen Bildern zu extrahieren, hat in Verbindung mit der Fähigkeit von long short
term memory network (LSTM), die zeitlichen Informationen zwischen Ereignissen zu erhalten,
viele neue Horizonte für die medizinische Forschung geschaffen. Der Gang ist einer der
kritischen physiologischen Bereiche, der viele Störungen im Zusammenhang mit Alterung
und Neurodegeneration widerspiegeln kann. Eine umfassende und genaue Ganganalyse
kann Einblicke in die physiologischen Bedingungen des Menschen geben. Bestehende
Ganganalyseverfahren erfordern eine spezielle Umgebung, komplexe medizinische Geräte
und geschultes Personal für die Erfassung der Gangdaten. Im Falle von tragbaren Systemen
kann ein solches System die kognitiven Fähigkeiten beeinträchtigen und für die Patienten
unangenehm sein.
Außerdem wurde berichtet, dass die Patienten in der Regel versuchen, während des
Labortests bessere Leistungen zu erbringen, was möglicherweise nicht ihrem tatsächlichen
Gang entspricht. Trotz technologischer Fortschritte stoßen wir bei der Messung des menschlichen
Gehens in klinischen und Laborumgebungen nach wie vor an Grenzen. Der Einsatz
aktueller Ganganalyseverfahren ist nach wie vor teuer und zeitaufwändig und erschwert den
Zugang zu Spezialgeräten und Fachwissen.
Daher ist es zwingend erforderlich, über Methoden zu verfügen, die langfristige Daten
über den Gesundheitszustand des Patienten liefern, ohne doppelte kognitive Aufgaben oder
Unannehmlichkeiten bei der Verwendung tragbarer Sensoren. In dieser Arbeit wird daher eine einfache, leicht zu implementierende und kostengünstige Methode zur Erfassung von
Gangdaten vorgeschlagen. Diese Methode basiert auf der Aufnahme von Gehvideos mit
einer Smartphone-Kamera in einer häuslichen Umgebung unter freien Bedingungen. Deep
neural network (NN) verarbeitet dann diese Videos, um die Gangereignisse zu extrahieren.
Die erkannten Ereignisse werden dann weiter verwendet, um verschiedene räumlich-zeitliche
Parameter des Gangs zu quantifizieren, die für jedes Ganganalysesystem wichtig sind.
In dieser Arbeit wurden Gangvideos verwendet, die mit einer Smartphone-Kamera mit
geringer Auflösung außerhalb der Laborumgebung aufgenommen wurden. Viele Deep-
Learning-basierte NNs wurden implementiert, um die grundlegenden Gangereignisse wie
die Fußposition in Bezug auf den Boden aus diesen Videos zu erkennen. In der ersten
Studie wurde die Architektur von AlexNet verwendet, um das Modell anhand von Gehvideos
und öffentlich verfügbaren Datensätzen von Grund auf zu trainieren. Mit diesem Modell
wurde eine Gesamtgenauigkeit von 74% erreicht. Im nächsten Schritt wurde jedoch die
LSTM-Schicht in dieselbe Architektur integriert. Die eingebaute Fähigkeit von LSTM in
Bezug auf die zeitliche Information führte zu einer verbesserten Vorhersage der Etiketten
für die Fußposition, und es wurde eine Genauigkeit von 91% erreicht. Allerdings gibt es
Schwierigkeiten bei der Vorhersage der richtigen Bezeichnungen in der letzten Phase des
Schwungs und der Standphase jedes Fußes.
Im nächsten Schritt wird das Transfer-Lernen eingesetzt, um die Vorteile von bereits
trainierten tiefen NNs zu nutzen, indem vortrainierte Gewichte verwendet werden. Zwei
bekannte Modelle, inceptionresnetv2 (IRNV-2) und densenet201 (DN-201), wurden mit
ihren gelernten Gewichten für das erneute Training des NN auf neuen Daten verwendet. Das
auf Transfer-Lernen basierende vortrainierte NN verbesserte die Vorhersage von Kennzeichnungen
für verschiedene Fußpositionen. Es reduzierte insbesondere die Schwankungen
in den Vorhersagen in der letzten Phase des Gangschwungs und der Standphase. Bei der
Vorhersage der Klassenbezeichnungen der Testdaten wurde eine Genauigkeit von 94% erreicht.
Da die Abweichung bei der Vorhersage des wahren Labels hauptsächlich ein Bild
betrug, konnte sie bei einer Bildrate von 30 Bildern pro Sekunde ignoriert werden.
Die vorhergesagten Markierungen wurden verwendet, um verschiedene räumlich-zeitliche
Parameter des Gangs zu extrahieren, die für jedes Ganganalysesystem entscheidend sind.
Insgesamt wurden 12 Gangparameter quantifiziert und mit der durch Beobachtungsmethoden
gewonnenen Grundwahrheit verglichen. Die NN-basierten räumlich-zeitlichen Parameter
zeigten eine hohe Korrelation mit der Grundwahrheit, und in einigen Fällen wurde eine sehr
hohe Korrelation erzielt. Die Ergebnisse belegen die Nützlichkeit der vorgeschlagenen Methode.
DerWert des Parameters über die Zeit ergab eine Zeitreihe, eine langfristige Darstellung des Ganges. Diese Zeitreihe konnte mit verschiedenen mathematischen Methoden weiter
analysiert werden.
Als dritter Beitrag in dieser Dissertation wurden Verbesserungen an den bestehenden
mathematischen Methoden der Zeitreihenanalyse von zeitlichen Gangdaten vorgeschlagen.
Zu diesem Zweck werden zwei Verfeinerungen bestehender entropiebasierter Methoden
zur Analyse von Schrittintervall-Zeitreihen vorgeschlagen. Diese Verfeinerungen wurden
an Schrittintervall-Zeitseriendaten von normalen und neurodegenerativen Erkrankungen
validiert, die aus der öffentlich zugänglichen Datenbank PhysioNet heruntergeladen wurden.
Die Ergebnisse zeigten, dass die von uns vorgeschlagene Methode eine klare Trennung
zwischen gesunden und kranken Gruppen ermöglicht.
In Zukunft könnten fortschrittliche medizinische Unterstützungssysteme, die künstliche
Intelligenz nutzen und von den hier vorgestellten Methoden abgeleitet sind, Ärzte bei der
Diagnose und langfristigen Überwachung des Gangs von Patienten unterstützen und so die
klinische Arbeitsbelastung verringern und die Patientensicherheit verbessern
Gender and gaze gesture recognition for human-computer interaction
© 2016 Elsevier Inc. The identification of visual cues in facial images has been widely explored in the broad area of computer vision. However theoretical analyses are often not transformed into widespread assistive Human-Computer Interaction (HCI) systems, due to factors such as inconsistent robustness, low efficiency, large computational expense or strong dependence on complex hardware. We present a novel gender recognition algorithm, a modular eye centre localisation approach and a gaze gesture recognition method, aiming to escalate the intelligence, adaptability and interactivity of HCI systems by combining demographic data (gender) and behavioural data (gaze) to enable development of a range of real-world assistive-technology applications. The gender recognition algorithm utilises Fisher Vectors as facial features which are encoded from low-level local features in facial images. We experimented with four types of low-level features: greyscale values, Local Binary Patterns (LBP), LBP histograms and Scale Invariant Feature Transform (SIFT). The corresponding Fisher Vectors were classified using a linear Support Vector Machine. The algorithm has been tested on the FERET database, the LFW database and the FRGCv2 database, yielding 97.7%, 92.5% and 96.7% accuracy respectively. The eye centre localisation algorithm has a modular approach, following a coarse-to-fine, global-to-regional scheme and utilising isophote and gradient features. A Selective Oriented Gradient filter has been specifically designed to detect and remove strong gradients from eyebrows, eye corners and self-shadows (which sabotage most eye centre localisation methods). The trajectories of the eye centres are then defined as gaze gestures for active HCI. The eye centre localisation algorithm has been compared with 10 other state-of-the-art algorithms with similar functionality and has outperformed them in terms of accuracy while maintaining excellent real-time performance. The above methods have been employed for development of a data recovery system that can be employed for implementation of advanced assistive technology tools. The high accuracy, reliability and real-time performance achieved for attention monitoring, gaze gesture control and recovery of demographic data, can enable the advanced human-robot interaction that is needed for developing systems that can provide assistance with everyday actions, thereby improving the quality of life for the elderly and/or disabled
- …