345 research outputs found

    An original framework for understanding human actions and body language by using deep neural networks

    Get PDF
    The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour. By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way. These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively. While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements; both are essential tasks in many computer vision applications, including event recognition, and video surveillance. In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided. The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements. All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods

    Gesture passwords: concepts, methods and challenges

    Full text link
    Biometrics are a convenient alternative to traditional forms of access control such as passwords and pass-cards since they rely solely on user-specific traits. Unlike alphanumeric passwords, biometrics cannot be given or told to another person, and unlike pass-cards, are always “on-hand.” Perhaps the most well-known biometrics with these properties are: face, speech, iris, and gait. This dissertation proposes a new biometric modality: gestures. A gesture is a short body motion that contains static anatomical information and changing behavioral (dynamic) information. This work considers both full-body gestures such as a large wave of the arms, and hand gestures such as a subtle curl of the fingers and palm. For access control, a specific gesture can be selected as a “password” and used for identification and authentication of a user. If this particular motion were somehow compromised, a user could readily select a new motion as a “password,” effectively changing and renewing the behavioral aspect of the biometric. This thesis describes a novel framework for acquiring, representing, and evaluating gesture passwords for the purpose of general access control. The framework uses depth sensors, such as the Kinect, to record gesture information from which depth maps or pose features are estimated. First, various distance measures, such as the log-euclidean distance between feature covariance matrices and distances based on feature sequence alignment via dynamic time warping, are used to compare two gestures, and train a classifier to either authenticate or identify a user. In authentication, this framework yields an equal error rate on the order of 1-2% for body and hand gestures in non-adversarial scenarios. Next, through a novel decomposition of gestures into posture, build, and dynamic components, the relative importance of each component is studied. The dynamic portion of a gesture is shown to have the largest impact on biometric performance with its removal causing a significant increase in error. In addition, the effects of two types of threats are investigated: one due to self-induced degradations (personal effects and the passage of time) and the other due to spoof attacks. For body gestures, both spoof attacks (with only the dynamic component) and self-induced degradations increase the equal error rate as expected. Further, the benefits of adding additional sensor viewpoints to this modality are empirically evaluated. Finally, a novel framework that leverages deep convolutional neural networks for learning a user-specific “style” representation from a set of known gestures is proposed and compared to a similar representation for gesture recognition. This deep convolutional neural network yields significantly improved performance over prior methods. A byproduct of this work is the creation and release of multiple publicly available, user-centric (as opposed to gesture-centric) datasets based on both body and hand gestures

    Non-contact measures to monitor hand movement of people with rheumatoid arthritis using a monocular RGB camera

    Get PDF
    Hand movements play an essential role in a person’s ability to interact with the environment. In hand biomechanics, the range of joint motion is a crucial metric to quantify changes due to degenerative pathologies, such as rheumatoid arthritis (RA). RA is a chronic condition where the immune system mistakenly attacks the joints, particularly those in the hands. Optoelectronic motion capture systems are gold-standard tools to quantify changes but are challenging to adopt outside laboratory settings. Deep learning executed on standard video data can capture RA participants in their natural environments, potentially supporting objectivity in remote consultation. The three main research aims in this thesis were 1) to assess the extent to which current deep learning architectures, which have been validated for quantifying motion of other body segments, can be applied to hand kinematics using monocular RGB cameras, 2) to localise where in videos the hand motions of interest are to be found, 3) to assess the validity of 1) and 2) to determine disease status in RA. First, hand kinematics for twelve healthy participants, captured with OpenPose were benchmarked against those captured using an optoelectronic system, showing acceptable instrument errors below 10°. Then, a gesture classifier was tested to segment video recordings of twenty-two healthy participants, achieving an accuracy of 93.5%. Finally, OpenPose and the classifier were applied to videos of RA participants performing hand exercises to determine disease status. The inferred disease activity exhibited agreement with the in-person ground truth in nine out of ten instances, outperforming virtual consultations, which agreed only six times out of ten. These results demonstrate that this approach is more effective than estimated disease activity performed by human experts during video consultations. The end goal sets the foundation for a tool that RA participants can use to observe their disease activity from their home.Open Acces

    Kinematic assessment for stroke patients in a stroke game and a daily activity recognition and assessment system

    Get PDF
    Stroke is the leading cause of serious, long-term disabilities among which deficits in motor abilities in arms or legs are most common. Those who suffer a stroke can recover through effective rehabilitation which is delicately personalized. To achieve the best personalization, it is essential for clinicians to monitor patients' health status and recovery progress accurately and consistently. Traditionally, rehabilitation involves patients performing exercises in clinics where clinicians oversee the procedure and evaluate patients' recovery progress. Following the in-clinic visits, additional home practices are tailored and assigned to patients. The in-clinic visits are important to evaluate recovery progress. The information collected can then help clinicians customize home practices for stroke patients. However, as the number of in-clinic sessions is limited by insurance policies, the recovery information collected in-clinic is often insufficient. Meanwhile, the home practice programs report low adherence rates based on historic data. Given that clinicians rely on patients to self-report adherence, the actual adherence rate could be even lower. Despite the limited feedback clinicians could receive, the measurement method is subjective as well. In practice, classic clinical scales are mostly used for assessing the qualities of movements and the recovery status of patients. However, these clinical scales are evaluated subjectively with only moderate inter-rater and intra-rater reliabilities. Taken together, clinicians lack a method to get sufficient and accurate feedback from patients, which limits the extent to which clinicians can personalize treatment plans. This work aims to solve this problem. To help clinicians obtain abundant health information regarding patients' recovery in an objective approach, I've developed a novel kinematic assessment toolchain that consists of two parts. The first part is a tool to evaluate stroke patients' motions collected in a rehabilitation game setting. This kinematic assessment tool utilizes body-tracking in a rehabilitation game. Specifically, a set of upper body assessment measures were proposed and calculated for assessing the movements using skeletal joint data. Statistical analysis was applied to evaluate the quality of upper body motions using the assessment outcomes. Second, to classify and quantify home activities for stroke patients objectively and accurately, I've developed DARAS, a daily activity recognition and assessment system that evaluates daily motions in a home setting. DARAS consists of three main components: daily action logger, action recognition part, and assessment part. The logger is implemented with a Foresite system to record daily activities using depth and skeletal joint data. Daily activity data in a realistic environment were collected from sixteen post-stroke participants. The collection period for each participant lasts three months. An ensemble network for activity recognition and temporal localization was developed to detect and segment the clinically relevant actions from the recorded data. The ensemble network fuses the prediction outputs from customized 3D Convolutional-De-Convolutional, customized Region Convolutional 3D network and a proposed Region Hierarchical Co-occurrence network which learns rich spatial-temporal features from either depth data or joint data. The per-frame precision and the per-action precision were 0.819 and 0.838, respectively, on the validation set. For the recognized actions, the kinematic assessments were performed using the skeletal joint data, as well as the longitudinal assessments. The results showed that, compared with non-stroke participants, stroke participants had slower hand movements, were less active, and tended to perform fewer hand manipulation actions. The assessment outcomes from the proposed toolchain help clinicians to provide more personalized rehabilitation plans that benefit patients.Includes bibliographical references

    Real-time 3D human body pose estimation from monocular RGB input

    Get PDF
    Human motion capture finds extensive application in movies, games, sports and biomechanical analysis. However, existing motion capture solutions require cumbersome external and/or on-body instrumentation, or use active sensors with limits on the possible capture volume dictated by power consumption. The ubiquity and ease of deployment of RGB cameras makes monocular RGB based human motion capture an extremely useful problem to solve, which would lower the barrier-to entry for content creators to employ motion capture tools, and enable newer applications of human motion capture. This thesis demonstrates the first real-time monocular RGB based motion-capture solutions that work in general scene settings. They are based on developing neural network based approaches to address the ill-posed problem of estimating 3D human pose from a single RGB image, in combination with model based fitting. In particular, the contributions of this work make advances towards three key aspects of real-time monocular RGB based motion capture, namely speed, accuracy, and the ability to work for general scenes. New training datasets are proposed, for single-person and multi-person scenarios, which, together with the proposed transfer learning based training pipeline, allow learning based approaches to be appearance invariant. The training datasets are accompanied by evaluation benchmarks with multiple avenues of fine-grained evaluation. The evaluation benchmarks differ visually from the training datasets, so as to promote efforts towards solutions that generalize to in-the-wild scenes. The proposed task formulations for the single-person and multi-person case allow higher accuracy, and incorporate additional qualities such as occlusion robustness, that are helpful in the context of a full motion capture solution. The multi-person formulations are designed to have a nearly constant inference time regardless of the number of subjects in the scene, and combined with contributions towards fast neural network inference, enable real-time 3D pose estimation for multiple subjects. Combining the proposed learning-based approaches with a model-based kinematic skeleton fitting step provides temporally stable joint angle estimates, which can be readily employed for driving virtual characters.Menschlicher Motion Capture findet umfangreiche Anwendung in Filmen, Spielen, Sport und biomechanischen Analysen. Bestehende Motion-Capture-Lösungen erfordern jedoch umständliche externe Instrumentierung und / oder Instrumentierung am Körper, oder verwenden aktive Sensoren deren begrenztes Erfassungsvolumen durch den Stromverbrauch begrenzt wird. Die Allgegenwart und einfache Bereitstellung von RGB-Kameras macht die monokulare RGB-basierte Motion Capture zu einem äußerst nützlichen Problem. Dies würde die Eintrittsbarriere für Inhaltsersteller für die Verwendung der Motion Capture verringern und neuere Anwendungen dieser Tools zur Analyse menschlicher Bewegungen ermöglichen. Diese Arbeit zeigt die ersten monokularen RGB-basierten Motion-Capture-Lösungen in Echtzeit, die in allgemeinen Szeneneinstellungen funktionieren. Sie basieren auf der Entwicklung neuronaler netzwerkbasierter Ansätze, um das schlecht gestellte Problem der Schätzung der menschlichen 3D-Pose aus einem einzelnen RGB-Bild in Kombination mit einer modellbasierten Anpassung anzugehen. Insbesondere machen die Beiträge dieser Arbeit Fortschritte in Richtung drei Schlüsselaspekte der monokularen RGB-basierten Echtzeit-Bewegungserfassung, nämlich Geschwindigkeit, Genauigkeit und die Fähigkeit, für allgemeine Szenen zu arbeiten. Es werden neue Trainingsdatensätze für Einzel- und Mehrpersonen-Szenarien vorgeschlagen, die zusammen mit der vorgeschlagenen Trainingspipeline, die auf Transferlernen basiert, ermöglichen, dass lernbasierte Ansätze nicht von Unterschieden im Erscheinungsbild des Bildes beeinflusst werden. Die Trainingsdatensätze werden von Bewertungsbenchmarks mit mehreren Möglichkeiten einer feinkörnigen Bewertung begleitet. Die angegebenen Benchmarks unterscheiden sich visuell von den Trainingsaufzeichnungen, um die Entwicklung von Lösungen zu fördern, die sich auf verschiedene Szenen verallgemeinern lassen. Die vorgeschlagenen Aufgabenformulierungen für den Einzel- und Mehrpersonenfall ermöglichen eine höhere Genauigkeit und enthalten zusätzliche Eigenschaften wie die Robustheit der Okklusion, die im Kontext einer vollständigen Bewegungserfassungslösung hilfreich sind. Die Mehrpersonenformulierungen sind so konzipiert, dass sie unabhängig von der Anzahl der Subjekte in der Szene eine nahezu konstante Inferenzzeit haben. In Kombination mit Beiträgen zur schnellen Inferenz neuronaler Netze ermöglichen sie eine 3D-Posenschätzung in Echtzeit für mehrere Subjekte. Die Kombination der vorgeschlagenen lernbasierten Ansätze mit einem modellbasierten kinematischen Skelettanpassungsschritt liefert zeitlich stabile Gelenkwinkelschätzungen, die leicht zum Ansteuern virtueller Charaktere verwendet werden können
    • …
    corecore