2,966 research outputs found

    Facial Video-based Remote Physiological Measurement via Self-supervised Learning

    Full text link
    Facial video-based remote physiological measurement aims to estimate remote photoplethysmography (rPPG) signals from human face videos and then measure multiple vital signs (e.g. heart rate, respiration frequency) from rPPG signals. Recent approaches achieve it by training deep neural networks, which normally require abundant facial videos and synchronously recorded photoplethysmography (PPG) signals for supervision. However, the collection of these annotated corpora is not easy in practice. In this paper, we introduce a novel frequency-inspired self-supervised framework that learns to estimate rPPG signals from facial videos without the need of ground truth PPG signals. Given a video sample, we first augment it into multiple positive/negative samples which contain similar/dissimilar signal frequencies to the original one. Specifically, positive samples are generated using spatial augmentation. Negative samples are generated via a learnable frequency augmentation module, which performs non-linear signal frequency transformation on the input without excessively changing its visual appearance. Next, we introduce a local rPPG expert aggregation module to estimate rPPG signals from augmented samples. It encodes complementary pulsation information from different face regions and aggregate them into one rPPG prediction. Finally, we propose a series of frequency-inspired losses, i.e. frequency contrastive loss, frequency ratio consistency loss, and cross-video frequency agreement loss, for the optimization of estimated rPPG signals from multiple augmented video samples and across temporally neighboring video samples. We conduct rPPG-based heart rate, heart rate variability and respiration frequency estimation on four standard benchmarks. The experimental results demonstrate that our method improves the state of the art by a large margin.Comment: IEEE Transactions on Pattern Analysis and Machine Intelligenc

    Remote Heart Rate Monitoring in Smart Environments from Videos with Self-supervised Pre-training

    Full text link
    Recent advances in deep learning have made it increasingly feasible to estimate heart rate remotely in smart environments by analyzing videos. However, a notable limitation of deep learning methods is their heavy reliance on extensive sets of labeled data for effective training. To address this issue, self-supervised learning has emerged as a promising avenue. Building on this, we introduce a solution that utilizes self-supervised contrastive learning for the estimation of remote photoplethysmography (PPG) and heart rate monitoring, thereby reducing the dependence on labeled data and enhancing performance. We propose the use of 3 spatial and 3 temporal augmentations for training an encoder through a contrastive framework, followed by utilizing the late-intermediate embeddings of the encoder for remote PPG and heart rate estimation. Our experiments on two publicly available datasets showcase the improvement of our proposed approach over several related works as well as supervised learning baselines, as our results approach the state-of-the-art. We also perform thorough experiments to showcase the effects of using different design choices such as the video representation learning method, the augmentations used in the pre-training stage, and others. We also demonstrate the robustness of our proposed method over the supervised learning approaches on reduced amounts of labeled data.Comment: Accepted in IEEE Internet of Things Journal 202

    Contrast-Phys+: Unsupervised and Weakly-supervised Video-based Remote Physiological Measurement via Spatiotemporal Contrast

    Full text link
    Video-based remote physiological measurement utilizes facial videos to measure the blood volume change signal, which is also called remote photoplethysmography (rPPG). Supervised methods for rPPG measurements have been shown to achieve good performance. However, the drawback of these methods is that they require facial videos with ground truth (GT) physiological signals, which are often costly and difficult to obtain. In this paper, we propose Contrast-Phys+, a method that can be trained in both unsupervised and weakly-supervised settings. We employ a 3DCNN model to generate multiple spatiotemporal rPPG signals and incorporate prior knowledge of rPPG into a contrastive loss function. We further incorporate the GT signals into contrastive learning to adapt to partial or misaligned labels. The contrastive loss encourages rPPG/GT signals from the same video to be grouped together, while pushing those from different videos apart. We evaluate our methods on five publicly available datasets that include both RGB and Near-infrared videos. Contrast-Phys+ outperforms the state-of-the-art supervised methods, even when using partially available or misaligned GT signals, or no labels at all. Additionally, we highlight the advantages of our methods in terms of computational efficiency, noise robustness, and generalization

    rPPG-MAE: Self-supervised Pre-training with Masked Autoencoders for Remote Physiological Measurement

    Full text link
    Remote photoplethysmography (rPPG) is an important technique for perceiving human vital signs, which has received extensive attention. For a long time, researchers have focused on supervised methods that rely on large amounts of labeled data. These methods are limited by the requirement for large amounts of data and the difficulty of acquiring ground truth physiological signals. To address these issues, several self-supervised methods based on contrastive learning have been proposed. However, they focus on the contrastive learning between samples, which neglect the inherent self-similar prior in physiological signals and seem to have a limited ability to cope with noisy. In this paper, a linear self-supervised reconstruction task was designed for extracting the inherent self-similar prior in physiological signals. Besides, a specific noise-insensitive strategy was explored for reducing the interference of motion and illumination. The proposed framework in this paper, namely rPPG-MAE, demonstrates excellent performance even on the challenging VIPL-HR dataset. We also evaluate the proposed method on two public datasets, namely PURE and UBFC-rPPG. The results show that our method not only outperforms existing self-supervised methods but also exceeds the state-of-the-art (SOTA) supervised methods. One important observation is that the quality of the dataset seems more important than the size in self-supervised pre-training of rPPG. The source code is released at https://github.com/linuxsino/rPPG-MAE

    Comparative evaluation of the applicability of self-organized operational neural networks to remote photoplethysmography

    Get PDF
    Abstract. Photoplethysmography (PPG) is a widely applied means of obtaining blood volume pulse (BVP) information from subjects which can be used for monitoring numerous physiological signs such as heart rate and respiration. Following observations that blood volume information can also be retrieved from videos recorded of the human face, several approaches for the remote extraction of PPG signals have been proposed in literature. These methods are collectively referred to as remote photoplethysmography (rPPG). The current state of the art of rPPG approaches is represented by deep convolutional neural network (CNN) models, which have been successfully applied in a wide range of computer vision tasks. A novel technology called operational neural networks (ONNs) has recently been proposed in literature as an extension of convolutional neural networks. ONNs attempt to overcome the limitations of conventional CNN models which are primarily caused by exclusively employing the linear neuron model. In addition, to address certain drawbacks of ONNs, a technology called self- organized operational neural networks (Self-ONNs) have recently been proposed as an extension of ONNs. This thesis presents a novel method for rPPG extraction based on self-organized operational neural networks. To comprehensively evaluate the applicability of Self-ONNs as an approach for rPPG extraction, three Self-ONN models with varying number of layers are implemented and evaluated on test data from three data sets representing different distributions. The performance of the proposed models are compared against corresponding CNN architectures as well as a typical unsupervised rPPG pipeline. The performance of the methods is evaluated based on heart rate estimations calculated from the extracted rPPG signals. In the presented experimental setup, Self-ONN models did not result in improved heart rate estimation performance over parameter-equivalent CNN alternatives. However, every Self-ONN model showed superior ability to fit the train target, which both shows promise for the applicability of Self-ONNs as well as suggests inherent problems in the training setup. Additionally, when taking into account the required computational resources in addition to raw HR estimation performance, certain Self-ONN models showcased improved efficiency over CNN alternatives. As such, the experiments nonetheless present a promising proof of concept which can serve as grounds for future research.Vertaileva arviointi itseorganisoituvien operationaalisten neuroverkkojen soveltuvuudesta etäfotopletysmografiaan. Tiivistelmä. Fotopletysmografia on laajasti sovellettu menetelmä veritilavuuspulssi-informaation saamiseksi kohteista, jota voidaan käyttää useiden fysiologisten arvojen, kuten sydämensykkeen ja hengityksen, seurannassa. Seuraten havainnoista, että veritilavuusinformaatiota on mahdollista palauttaa myös ihmiskasvoista kuvatuista videoista, useita menetelmiä fotopletysmografiasignaalien erottamiseksi etänä on esitetty kirjallisuudessa. Yhteisnimitys näille menetelmille on etäfotopletysmografia (remote photoplethysmography, rPPG). Syvät konvolutionaaliset neuroverkkomallit (convolutional neural networks, CNNs), joita on onnistuneesti sovellettu laajaan valikoimaan tietokonenäön tehtäviä, edustavat nykyistä rPPG-lähestymistapojen huippua. Uusi teknologia nimeltään operationaaliset neuroverkot (operational neural networks, ONNs) on hiljattain esitetty kirjallisuudessa konvolutionaalisten neuroverkkojen laajennukseksi. ONN:t pyrkivät eroon tavanomaisten CNN-mallien rajoitteista, jotka johtuvat pääasiassa lineaarisen neuronimallin yksinomaisesta käytöstä. Lisäksi tietyistä ONN-mallien heikkouksista eroon pääsemiseksi, teknologia nimeltään itseorganisoituvat operationaaliset neuroverkot (self-organized operational neural networks, Self-ONNs) on hiljattain esitetty lajeennuksena ONN:ille. Tämä tutkielma esittelee uudenlaisen menetelmän rPPG-erotukselle pohjautuen itseorganisoituviin operationaalisiin neuroverkkoihin. Self-ONN:ien soveltuvuuden rPPG-erotukseen perusteelliseksi arvioimiseksi kolme Self-ONN -mallia vaihtelevalla määrällä kerroksia toteutetaan ja arvioidaan testidatalla kolmesta eri datajoukosta, jotka edustavat eri jakaumia. Esitettyjen mallien suorituskykyä verrataan vastaaviin CNN-arkkitehtuureihin sekä tyypilliseen ohjaamattomaan rPPG-liukuhihnaan. Menetelmien suorituskykyä arvioidaan perustuen rPPG-signaaleista laskettuihin sydämensykearvioihin. Esitellyssä kokeellisessa asetelmassa Self-ONN:t eivät johtaneet parempiin sykearvioihin verrattuna parametrivastaaviin CNN-vaihtoehtoihin. Self-ONN:t kuitenkin osoittivat ylivertaista kykyä sovittaa opetuskohteen, mikä sekä on lupaavaa Self-ONN:ien soveltuvuuden kannalta että viittaa luontaisiin ongelmiin opetusasetelmassa. Lisäksi, kun huomioon otetaan vaaditut laskentaresurssit raa’an sykkeen arvioinnin suorituskyvyn lisäksi, tietyt Self-ONN -mallit osoittivat parempaa tehokkuutta CNN-vaihtoehtoihin verrattuna. Näin ollen kokeet joka tapauksessa tarjoavat lupaavan konseptitodistuksen, joka voi toimia perustana tulevalle tutkimukselle

    Privacy-Preserving Remote Heart Rate Estimation from Facial Videos

    Full text link
    Remote Photoplethysmography (rPPG) is the process of estimating PPG from facial videos. While this approach benefits from contactless interaction, it is reliant on videos of faces, which often constitutes an important privacy concern. Recent research has revealed that deep learning techniques are vulnerable to attacks, which can result in significant data breaches making deep rPPG estimation even more sensitive. To address this issue, we propose a data perturbation method that involves extraction of certain areas of the face with less identity-related information, followed by pixel shuffling and blurring. Our experiments on two rPPG datasets (PURE and UBFC) show that our approach reduces the accuracy of facial recognition algorithms by over 60%, with minimal impact on rPPG extraction. We also test our method on three facial recognition datasets (LFW, CALFW, and AgeDB), where our approach reduced performance by nearly 50%. Our findings demonstrate the potential of our approach as an effective privacy-preserving solution for rPPG estimation.Comment: Accepted in IEEE International Conference on Systems, Man, and Cybernetics (SMC) 202

    Camera-Based Heart Rate Extraction in Noisy Environments

    Get PDF
    Remote photoplethysmography (rPPG) is a non-invasive technique that benefits from video to measure vital signs such as the heart rate (HR). In rPPG estimation, noise can introduce artifacts that distort rPPG signal and jeopardize accurate HR measurement. Considering that most rPPG studies occurred in lab-controlled environments, the issue of noise in realistic conditions remains open. This thesis aims to examine the challenges of noise in rPPG estimation in realistic scenarios, specifically investigating the effect of noise arising from illumination variation and motion artifacts on the predicted rPPG HR. To mitigate the impact of noise, a modular rPPG measurement framework, comprising data preprocessing, region of interest, signal extraction, preparation, processing, and HR extraction is developed. The proposed pipeline is tested on the LGI-PPGI-Face-Video-Database public dataset, hosting four different candidates and real-life scenarios. In the RoI module, raw rPPG signals were extracted from the dataset using three machine learning-based face detectors, namely Haarcascade, Dlib, and MediaPipe, in parallel. Subsequently, the collected signals underwent preprocessing, independent component analysis, denoising, and frequency domain conversion for peak detection. Overall, the Dlib face detector leads to the most successful HR for the majority of scenarios. In 50% of all scenarios and candidates, the average predicted HR for Dlib is either in line or very close to the average reference HR. The extracted HRs from the Haarcascade and MediaPipe architectures make up 31.25% and 18.75% of plausible results, respectively. The analysis highlighted the importance of fixated facial landmarks in collecting quality raw data and reducing noise

    Non-Contrastive Unsupervised Learning of Physiological Signals from Video

    Full text link
    Subtle periodic signals such as blood volume pulse and respiration can be extracted from RGB video, enabling remote health monitoring at low cost. Advancements in remote pulse estimation -- or remote photoplethysmography (rPPG) -- are currently driven by deep learning solutions. However, modern approaches are trained and evaluated on benchmark datasets with associated ground truth from contact-PPG sensors. We present the first non-contrastive unsupervised learning framework for signal regression to break free from the constraints of labelled video data. With minimal assumptions of periodicity and finite bandwidth, our approach is capable of discovering the blood volume pulse directly from unlabelled videos. We find that encouraging sparse power spectra within normal physiological bandlimits and variance over batches of power spectra is sufficient for learning visual features of periodic signals. We perform the first experiments utilizing unlabelled video data not specifically created for rPPG to train robust pulse rate estimators. Given the limited inductive biases and impressive empirical results, the approach is theoretically capable of discovering other periodic signals from video, enabling multiple physiological measurements without the need for ground truth signals. Codes to fully reproduce the experiments are made available along with the paper.Comment: Accepted to CVPR 202
    corecore