605 research outputs found

    MetaGait: Learning to Learn an Omni Sample Adaptive Representation for Gait Recognition

    Full text link
    Gait recognition, which aims at identifying individuals by their walking patterns, has recently drawn increasing research attention. However, gait recognition still suffers from the conflicts between the limited binary visual clues of the silhouette and numerous covariates with diverse scales, which brings challenges to the model's adaptiveness. In this paper, we address this conflict by developing a novel MetaGait that learns to learn an omni sample adaptive representation. Towards this goal, MetaGait injects meta-knowledge, which could guide the model to perceive sample-specific properties, into the calibration network of the attention mechanism to improve the adaptiveness from the omni-scale, omni-dimension, and omni-process perspectives. Specifically, we leverage the meta-knowledge across the entire process, where Meta Triple Attention and Meta Temporal Pooling are presented respectively to adaptively capture omni-scale dependency from spatial/channel/temporal dimensions simultaneously and to adaptively aggregate temporal information through integrating the merits of three complementary temporal aggregation methods. Extensive experiments demonstrate the state-of-the-art performance of the proposed MetaGait. On CASIA-B, we achieve rank-1 accuracy of 98.7%, 96.0%, and 89.3% under three conditions, respectively. On OU-MVLP, we achieve rank-1 accuracy of 92.4%.Comment: Accepted by ECCV202

    GaitStrip: Gait Recognition via Effective Strip-based Feature Representations and Multi-Level Framework

    Full text link
    Many gait recognition methods first partition the human gait into N-parts and then combine them to establish part-based feature representations. Their gait recognition performance is often affected by partitioning strategies, which are empirically chosen in different datasets. However, we observe that strips as the basic component of parts are agnostic against different partitioning strategies. Motivated by this observation, we present a strip-based multi-level gait recognition network, named GaitStrip, to extract comprehensive gait information at different levels. To be specific, our high-level branch explores the context of gait sequences and our low-level one focuses on detailed posture changes. We introduce a novel StriP-Based feature extractor (SPB) to learn the strip-based feature representations by directly taking each strip of the human body as the basic unit. Moreover, we propose a novel multi-branch structure, called Enhanced Convolution Module (ECM), to extract different representations of gaits. ECM consists of the Spatial-Temporal feature extractor (ST), the Frame-Level feature extractor (FL) and SPB, and has two obvious advantages: First, each branch focuses on a specific representation, which can be used to improve the robustness of the network. Specifically, ST aims to extract spatial-temporal features of gait sequences, while FL is used to generate the feature representation of each frame. Second, the parameters of the ECM can be reduced in test by introducing a structural re-parameterization technique. Extensive experimental results demonstrate that our GaitStrip achieves state-of-the-art performance in both normal walking and complex conditions.Comment: Accepted to ACCV202

    Person recognition based on deep gait: a survey.

    Get PDF
    Gait recognition, also known as walking pattern recognition, has expressed deep interest in the computer vision and biometrics community due to its potential to identify individuals from a distance. It has attracted increasing attention due to its potential applications and non-invasive nature. Since 2014, deep learning approaches have shown promising results in gait recognition by automatically extracting features. However, recognizing gait accurately is challenging due to the covariate factors, complexity and variability of environments, and human body representations. This paper provides a comprehensive overview of the advancements made in this field along with the challenges and limitations associated with deep learning methods. For that, it initially examines the various gait datasets used in the literature review and analyzes the performance of state-of-the-art techniques. After that, a taxonomy of deep learning methods is presented to characterize and organize the research landscape in this field. Furthermore, the taxonomy highlights the basic limitations of deep learning methods in the context of gait recognition. The paper is concluded by focusing on the present challenges and suggesting several research directions to improve the performance of gait recognition in the future

    Gait recognition for person re-identification

    Get PDF
    Person re-identification across multiple cameras is an essential task in computer vision applications, particularly tracking the same person in different scenes. Gait recognition, which is the recognition based on the walking style, is mostly used for this purpose due to that human gait has unique characteristics that allow recognizing a person from a distance. However, human recognition via gait technique could be limited with the position of captured images or videos. Hence, this paper proposes a gait recognition approach for person re-identification. The proposed approach starts with estimating the angle of the gait first, and this is then followed with the recognition process, which is performed using convolutional neural networks. Herein, multitask convolutional neural network models and extracted gait energy images (GEIs) are used to estimate the angle and recognize the gait. GEIs are extracted by first detecting the moving objects, using background subtraction techniques. Training and testing phases are applied to the following three recognized datasets: CASIA-(B), OU-ISIR, and OU-MVLP. The proposed method is evaluated for background modeling using the Scene Background Modeling and Initialization (SBI) dataset. The proposed gait recognition method showed an accuracy of more than 98% for almost all datasets. Results of the proposed approach showed higher accuracy compared to obtained results of other methods result for CASIA-(B) and OU-MVLP and form the best results for the OU-ISIR dataset

    Soft Biometric Analysis: MultiPerson and RealTime Pedestrian Attribute Recognition in Crowded Urban Environments

    Get PDF
    Traditionally, recognition systems were only based on human hard biometrics. However, the ubiquitous CCTV cameras have raised the desire to analyze human biometrics from far distances, without people attendance in the acquisition process. Highresolution face closeshots are rarely available at far distances such that facebased systems cannot provide reliable results in surveillance applications. Human soft biometrics such as body and clothing attributes are believed to be more effective in analyzing human data collected by security cameras. This thesis contributes to the human soft biometric analysis in uncontrolled environments and mainly focuses on two tasks: Pedestrian Attribute Recognition (PAR) and person reidentification (reid). We first review the literature of both tasks and highlight the history of advancements, recent developments, and the existing benchmarks. PAR and person reid difficulties are due to significant distances between intraclass samples, which originate from variations in several factors such as body pose, illumination, background, occlusion, and data resolution. Recent stateoftheart approaches present endtoend models that can extract discriminative and comprehensive feature representations from people. The correlation between different regions of the body and dealing with limited learning data is also the objective of many recent works. Moreover, class imbalance and correlation between human attributes are specific challenges associated with the PAR problem. We collect a large surveillance dataset to train a novel gender recognition model suitable for uncontrolled environments. We propose a deep residual network that extracts several posewise patches from samples and obtains a comprehensive feature representation. In the next step, we develop a model for multiple attribute recognition at once. Considering the correlation between human semantic attributes and class imbalance, we respectively use a multitask model and a weighted loss function. We also propose a multiplication layer on top of the backbone features extraction layers to exclude the background features from the final representation of samples and draw the attention of the model to the foreground area. We address the problem of person reid by implicitly defining the receptive fields of deep learning classification frameworks. The receptive fields of deep learning models determine the most significant regions of the input data for providing correct decisions. Therefore, we synthesize a set of learning data in which the destructive regions (e.g., background) in each pair of instances are interchanged. A segmentation module determines destructive and useful regions in each sample, and the label of synthesized instances are inherited from the sample that shared the useful regions in the synthesized image. The synthesized learning data are then used in the learning phase and help the model rapidly learn that the identity and background regions are not correlated. Meanwhile, the proposed solution could be seen as a data augmentation approach that fully preserves the label information and is compatible with other data augmentation techniques. When reid methods are learned in scenarios where the target person appears with identical garments in the gallery, the visual appearance of clothes is given the most importance in the final feature representation. Clothbased representations are not reliable in the longterm reid settings as people may change their clothes. Therefore, developing solutions that ignore clothing cues and focus on identityrelevant features are in demand. We transform the original data such that the identityrelevant information of people (e.g., face and body shape) are removed, while the identityunrelated cues (i.e., color and texture of clothes) remain unchanged. A learned model on the synthesized dataset predicts the identityunrelated cues (shortterm features). Therefore, we train a second model coupled with the first model and learns the embeddings of the original data such that the similarity between the embeddings of the original and synthesized data is minimized. This way, the second model predicts based on the identityrelated (longterm) representation of people. To evaluate the performance of the proposed models, we use PAR and person reid datasets, namely BIODI, PETA, RAP, Market1501, MSMTV2, PRCC, LTCC, and MIT and compared our experimental results with stateoftheart methods in the field. In conclusion, the data collected from surveillance cameras have low resolution, such that the extraction of hard biometric features is not possible, and facebased approaches produce poor results. In contrast, soft biometrics are robust to variations in data quality. So, we propose approaches both for PAR and person reid to learn discriminative features from each instance and evaluate our proposed solutions on several publicly available benchmarks.This thesis was prepared at the University of Beria Interior, IT Instituto de Telecomunicações, Soft Computing and Image Analysis Laboratory (SOCIA Lab), Covilhã Delegation, and was submitted to the University of Beira Interior for defense in a public examination session

    Human Gait Analysis using Spatiotemporal Data Obtained from Gait Videos

    Get PDF
    Mit der Entwicklung von Deep-Learning-Techniken sind Deep-acNN-basierte Methoden zum Standard für Bildverarbeitungsaufgaben geworden, wie z. B. die Verfolgung menschlicher Bewegungen und Posenschätzung, die Erkennung menschlicher Aktivitäten und die Erkennung von Gesichtern. Deep-Learning-Techniken haben den Entwurf, die Implementierung und den Einsatz komplexer und vielfältiger Anwendungen verbessert, die nun in einer Vielzahl von Bereichen, einschließlich der Biomedizintechnik, eingesetzt werden. Die Anwendung von Computer-Vision-Techniken auf die medizinische Bild- und Videoanalyse hat zu bemerkenswerten Ergebnissen bei der Erkennung von Ereignissen geführt. Die eingebaute Fähigkeit von convolutional neural network (CNN), Merkmale aus komplexen medizinischen Bildern zu extrahieren, hat in Verbindung mit der Fähigkeit von long short term memory network (LSTM), die zeitlichen Informationen zwischen Ereignissen zu erhalten, viele neue Horizonte für die medizinische Forschung geschaffen. Der Gang ist einer der kritischen physiologischen Bereiche, der viele Störungen im Zusammenhang mit Alterung und Neurodegeneration widerspiegeln kann. Eine umfassende und genaue Ganganalyse kann Einblicke in die physiologischen Bedingungen des Menschen geben. Bestehende Ganganalyseverfahren erfordern eine spezielle Umgebung, komplexe medizinische Geräte und geschultes Personal für die Erfassung der Gangdaten. Im Falle von tragbaren Systemen kann ein solches System die kognitiven Fähigkeiten beeinträchtigen und für die Patienten unangenehm sein. Außerdem wurde berichtet, dass die Patienten in der Regel versuchen, während des Labortests bessere Leistungen zu erbringen, was möglicherweise nicht ihrem tatsächlichen Gang entspricht. Trotz technologischer Fortschritte stoßen wir bei der Messung des menschlichen Gehens in klinischen und Laborumgebungen nach wie vor an Grenzen. Der Einsatz aktueller Ganganalyseverfahren ist nach wie vor teuer und zeitaufwändig und erschwert den Zugang zu Spezialgeräten und Fachwissen. Daher ist es zwingend erforderlich, über Methoden zu verfügen, die langfristige Daten über den Gesundheitszustand des Patienten liefern, ohne doppelte kognitive Aufgaben oder Unannehmlichkeiten bei der Verwendung tragbarer Sensoren. In dieser Arbeit wird daher eine einfache, leicht zu implementierende und kostengünstige Methode zur Erfassung von Gangdaten vorgeschlagen. Diese Methode basiert auf der Aufnahme von Gehvideos mit einer Smartphone-Kamera in einer häuslichen Umgebung unter freien Bedingungen. Deep neural network (NN) verarbeitet dann diese Videos, um die Gangereignisse zu extrahieren. Die erkannten Ereignisse werden dann weiter verwendet, um verschiedene räumlich-zeitliche Parameter des Gangs zu quantifizieren, die für jedes Ganganalysesystem wichtig sind. In dieser Arbeit wurden Gangvideos verwendet, die mit einer Smartphone-Kamera mit geringer Auflösung außerhalb der Laborumgebung aufgenommen wurden. Viele Deep- Learning-basierte NNs wurden implementiert, um die grundlegenden Gangereignisse wie die Fußposition in Bezug auf den Boden aus diesen Videos zu erkennen. In der ersten Studie wurde die Architektur von AlexNet verwendet, um das Modell anhand von Gehvideos und öffentlich verfügbaren Datensätzen von Grund auf zu trainieren. Mit diesem Modell wurde eine Gesamtgenauigkeit von 74% erreicht. Im nächsten Schritt wurde jedoch die LSTM-Schicht in dieselbe Architektur integriert. Die eingebaute Fähigkeit von LSTM in Bezug auf die zeitliche Information führte zu einer verbesserten Vorhersage der Etiketten für die Fußposition, und es wurde eine Genauigkeit von 91% erreicht. Allerdings gibt es Schwierigkeiten bei der Vorhersage der richtigen Bezeichnungen in der letzten Phase des Schwungs und der Standphase jedes Fußes. Im nächsten Schritt wird das Transfer-Lernen eingesetzt, um die Vorteile von bereits trainierten tiefen NNs zu nutzen, indem vortrainierte Gewichte verwendet werden. Zwei bekannte Modelle, inceptionresnetv2 (IRNV-2) und densenet201 (DN-201), wurden mit ihren gelernten Gewichten für das erneute Training des NN auf neuen Daten verwendet. Das auf Transfer-Lernen basierende vortrainierte NN verbesserte die Vorhersage von Kennzeichnungen für verschiedene Fußpositionen. Es reduzierte insbesondere die Schwankungen in den Vorhersagen in der letzten Phase des Gangschwungs und der Standphase. Bei der Vorhersage der Klassenbezeichnungen der Testdaten wurde eine Genauigkeit von 94% erreicht. Da die Abweichung bei der Vorhersage des wahren Labels hauptsächlich ein Bild betrug, konnte sie bei einer Bildrate von 30 Bildern pro Sekunde ignoriert werden. Die vorhergesagten Markierungen wurden verwendet, um verschiedene räumlich-zeitliche Parameter des Gangs zu extrahieren, die für jedes Ganganalysesystem entscheidend sind. Insgesamt wurden 12 Gangparameter quantifiziert und mit der durch Beobachtungsmethoden gewonnenen Grundwahrheit verglichen. Die NN-basierten räumlich-zeitlichen Parameter zeigten eine hohe Korrelation mit der Grundwahrheit, und in einigen Fällen wurde eine sehr hohe Korrelation erzielt. Die Ergebnisse belegen die Nützlichkeit der vorgeschlagenen Methode. DerWert des Parameters über die Zeit ergab eine Zeitreihe, eine langfristige Darstellung des Ganges. Diese Zeitreihe konnte mit verschiedenen mathematischen Methoden weiter analysiert werden. Als dritter Beitrag in dieser Dissertation wurden Verbesserungen an den bestehenden mathematischen Methoden der Zeitreihenanalyse von zeitlichen Gangdaten vorgeschlagen. Zu diesem Zweck werden zwei Verfeinerungen bestehender entropiebasierter Methoden zur Analyse von Schrittintervall-Zeitreihen vorgeschlagen. Diese Verfeinerungen wurden an Schrittintervall-Zeitseriendaten von normalen und neurodegenerativen Erkrankungen validiert, die aus der öffentlich zugänglichen Datenbank PhysioNet heruntergeladen wurden. Die Ergebnisse zeigten, dass die von uns vorgeschlagene Methode eine klare Trennung zwischen gesunden und kranken Gruppen ermöglicht. In Zukunft könnten fortschrittliche medizinische Unterstützungssysteme, die künstliche Intelligenz nutzen und von den hier vorgestellten Methoden abgeleitet sind, Ärzte bei der Diagnose und langfristigen Überwachung des Gangs von Patienten unterstützen und so die klinische Arbeitsbelastung verringern und die Patientensicherheit verbessern

    Gender and gaze gesture recognition for human-computer interaction

    Get PDF
    © 2016 Elsevier Inc. The identification of visual cues in facial images has been widely explored in the broad area of computer vision. However theoretical analyses are often not transformed into widespread assistive Human-Computer Interaction (HCI) systems, due to factors such as inconsistent robustness, low efficiency, large computational expense or strong dependence on complex hardware. We present a novel gender recognition algorithm, a modular eye centre localisation approach and a gaze gesture recognition method, aiming to escalate the intelligence, adaptability and interactivity of HCI systems by combining demographic data (gender) and behavioural data (gaze) to enable development of a range of real-world assistive-technology applications. The gender recognition algorithm utilises Fisher Vectors as facial features which are encoded from low-level local features in facial images. We experimented with four types of low-level features: greyscale values, Local Binary Patterns (LBP), LBP histograms and Scale Invariant Feature Transform (SIFT). The corresponding Fisher Vectors were classified using a linear Support Vector Machine. The algorithm has been tested on the FERET database, the LFW database and the FRGCv2 database, yielding 97.7%, 92.5% and 96.7% accuracy respectively. The eye centre localisation algorithm has a modular approach, following a coarse-to-fine, global-to-regional scheme and utilising isophote and gradient features. A Selective Oriented Gradient filter has been specifically designed to detect and remove strong gradients from eyebrows, eye corners and self-shadows (which sabotage most eye centre localisation methods). The trajectories of the eye centres are then defined as gaze gestures for active HCI. The eye centre localisation algorithm has been compared with 10 other state-of-the-art algorithms with similar functionality and has outperformed them in terms of accuracy while maintaining excellent real-time performance. The above methods have been employed for development of a data recovery system that can be employed for implementation of advanced assistive technology tools. The high accuracy, reliability and real-time performance achieved for attention monitoring, gaze gesture control and recovery of demographic data, can enable the advanced human-robot interaction that is needed for developing systems that can provide assistance with everyday actions, thereby improving the quality of life for the elderly and/or disabled
    corecore