20 research outputs found

    Skeleton-Based Human Action Recognition with Global Context-Aware Attention LSTM Networks

    Full text link
    Human action recognition in 3D skeleton sequences has attracted a lot of research attention. Recently, Long Short-Term Memory (LSTM) networks have shown promising performance in this task due to their strengths in modeling the dependencies and dynamics in sequential data. As not all skeletal joints are informative for action recognition, and the irrelevant joints often bring noise which can degrade the performance, we need to pay more attention to the informative ones. However, the original LSTM network does not have explicit attention ability. In this paper, we propose a new class of LSTM network, Global Context-Aware Attention LSTM (GCA-LSTM), for skeleton based action recognition. This network is capable of selectively focusing on the informative joints in each frame of each skeleton sequence by using a global context memory cell. To further improve the attention capability of our network, we also introduce a recurrent attention mechanism, with which the attention performance of the network can be enhanced progressively. Moreover, we propose a stepwise training scheme in order to train our network effectively. Our approach achieves state-of-the-art performance on five challenging benchmark datasets for skeleton based action recognition

    Remote Heart Rate Monitoring in Smart Environments from Videos with Self-supervised Pre-training

    Full text link
    Recent advances in deep learning have made it increasingly feasible to estimate heart rate remotely in smart environments by analyzing videos. However, a notable limitation of deep learning methods is their heavy reliance on extensive sets of labeled data for effective training. To address this issue, self-supervised learning has emerged as a promising avenue. Building on this, we introduce a solution that utilizes self-supervised contrastive learning for the estimation of remote photoplethysmography (PPG) and heart rate monitoring, thereby reducing the dependence on labeled data and enhancing performance. We propose the use of 3 spatial and 3 temporal augmentations for training an encoder through a contrastive framework, followed by utilizing the late-intermediate embeddings of the encoder for remote PPG and heart rate estimation. Our experiments on two publicly available datasets showcase the improvement of our proposed approach over several related works as well as supervised learning baselines, as our results approach the state-of-the-art. We also perform thorough experiments to showcase the effects of using different design choices such as the video representation learning method, the augmentations used in the pre-training stage, and others. We also demonstrate the robustness of our proposed method over the supervised learning approaches on reduced amounts of labeled data.Comment: Accepted in IEEE Internet of Things Journal 202

    Learning How To Recognize Faces In Heterogeneous Environments

    Get PDF
    Face recognition is a mature field in biometrics in which several systems have been proposed over the last three decades. Such systems are extremely reliable under controlled recording conditions and it has been deployed in the field in critical tasks, such as in border control and in less critical ones, such as to unlock mobile phones. However, the lack of cooperation from the subject and variations on the pose, occlusion and illumination are still open problems and significantly affect error rates. Another challenge that arose recently in face recognition research is the ability of matching faces from different image domains. Use cases encompass the matching between Visual Light images (VIS) with Near infra-red images (NIR), Visual Light images (VIS) with Thermograms or Depth maps. This match can occur even in situations where no real face exists, such as matching using sketches. This task is so called Heterogeneous Face Recognition. The key difficulty in the comparison of faces in heterogeneous conditions is that images from the same subject may differ in appearance due to changes in image domain. In this thesis we address this problem of Heterogeneous Face Recognition (HFR). Our contributions are four-fold. First, we analyze the applicability of crafted features used in face recognition in the HFR task. Second, still working with crafted features, we propose that the variability between two image domains can be suppressed with a linear shift in the Gaussian Mixture Model (GMM) mean subspace. That encompasses inter-session variability (ISV) modeling. Third, we propose that high level features of Deep Convolutional Neural Networks trained on Visual Light images are potentially domain independent and can be used to encode faces sensed in different image domains. Fourth, large-scale experiments are conducted on several HFR databases, covering various image domains showing competitive performances. Moreover, the implementation of all the proposed techniques are integrated into a collaborative open source software library called Bob that enforces fair evaluations and encourages reproducible research
    corecore