20 research outputs found
Skeleton-Based Human Action Recognition with Global Context-Aware Attention LSTM Networks
Human action recognition in 3D skeleton sequences has attracted a lot of
research attention. Recently, Long Short-Term Memory (LSTM) networks have shown
promising performance in this task due to their strengths in modeling the
dependencies and dynamics in sequential data. As not all skeletal joints are
informative for action recognition, and the irrelevant joints often bring noise
which can degrade the performance, we need to pay more attention to the
informative ones. However, the original LSTM network does not have explicit
attention ability. In this paper, we propose a new class of LSTM network,
Global Context-Aware Attention LSTM (GCA-LSTM), for skeleton based action
recognition. This network is capable of selectively focusing on the informative
joints in each frame of each skeleton sequence by using a global context memory
cell. To further improve the attention capability of our network, we also
introduce a recurrent attention mechanism, with which the attention performance
of the network can be enhanced progressively. Moreover, we propose a stepwise
training scheme in order to train our network effectively. Our approach
achieves state-of-the-art performance on five challenging benchmark datasets
for skeleton based action recognition
Remote Heart Rate Monitoring in Smart Environments from Videos with Self-supervised Pre-training
Recent advances in deep learning have made it increasingly feasible to
estimate heart rate remotely in smart environments by analyzing videos.
However, a notable limitation of deep learning methods is their heavy reliance
on extensive sets of labeled data for effective training. To address this
issue, self-supervised learning has emerged as a promising avenue. Building on
this, we introduce a solution that utilizes self-supervised contrastive
learning for the estimation of remote photoplethysmography (PPG) and heart rate
monitoring, thereby reducing the dependence on labeled data and enhancing
performance. We propose the use of 3 spatial and 3 temporal augmentations for
training an encoder through a contrastive framework, followed by utilizing the
late-intermediate embeddings of the encoder for remote PPG and heart rate
estimation. Our experiments on two publicly available datasets showcase the
improvement of our proposed approach over several related works as well as
supervised learning baselines, as our results approach the state-of-the-art. We
also perform thorough experiments to showcase the effects of using different
design choices such as the video representation learning method, the
augmentations used in the pre-training stage, and others. We also demonstrate
the robustness of our proposed method over the supervised learning approaches
on reduced amounts of labeled data.Comment: Accepted in IEEE Internet of Things Journal 202
Learning How To Recognize Faces In Heterogeneous Environments
Face recognition is a mature field in biometrics in which several systems have been proposed over the last three decades.
Such systems are extremely reliable under controlled recording conditions and it has been deployed in the field in critical tasks, such as in border control and in less critical ones, such as to unlock mobile phones.
However, the lack of cooperation from the subject and variations on the pose, occlusion and illumination are still open problems and significantly affect error rates.
Another challenge that arose recently in face recognition research is the ability of matching faces from different image domains.
Use cases encompass the matching between Visual Light images (VIS) with Near infra-red images (NIR), Visual Light images (VIS) with Thermograms or Depth maps.
This match can occur even in situations where no real face exists, such as matching using sketches.
This task is so called Heterogeneous Face Recognition.
The key difficulty in the comparison of faces in heterogeneous conditions is that images from the same subject may differ in appearance due to changes in image domain.
In this thesis we address this problem of Heterogeneous Face Recognition (HFR).
Our contributions are four-fold.
First, we analyze the applicability of crafted features used in face recognition in the HFR task.
Second, still working with crafted features, we propose that the variability between two image domains can be suppressed with a linear shift in the Gaussian Mixture Model (GMM) mean subspace.
That encompasses inter-session variability (ISV) modeling.
Third, we propose that high level features of Deep Convolutional Neural Networks trained on Visual Light images are potentially domain independent and can be used to encode faces sensed in different image domains.
Fourth, large-scale experiments are conducted on several HFR databases, covering various image domains showing competitive performances.
Moreover, the implementation of all the proposed techniques are integrated into a collaborative open source software library called Bob that enforces fair evaluations and encourages reproducible research