61 research outputs found

    Comparative evaluation of the applicability of self-organized operational neural networks to remote photoplethysmography

    Get PDF
    Abstract. Photoplethysmography (PPG) is a widely applied means of obtaining blood volume pulse (BVP) information from subjects which can be used for monitoring numerous physiological signs such as heart rate and respiration. Following observations that blood volume information can also be retrieved from videos recorded of the human face, several approaches for the remote extraction of PPG signals have been proposed in literature. These methods are collectively referred to as remote photoplethysmography (rPPG). The current state of the art of rPPG approaches is represented by deep convolutional neural network (CNN) models, which have been successfully applied in a wide range of computer vision tasks. A novel technology called operational neural networks (ONNs) has recently been proposed in literature as an extension of convolutional neural networks. ONNs attempt to overcome the limitations of conventional CNN models which are primarily caused by exclusively employing the linear neuron model. In addition, to address certain drawbacks of ONNs, a technology called self- organized operational neural networks (Self-ONNs) have recently been proposed as an extension of ONNs. This thesis presents a novel method for rPPG extraction based on self-organized operational neural networks. To comprehensively evaluate the applicability of Self-ONNs as an approach for rPPG extraction, three Self-ONN models with varying number of layers are implemented and evaluated on test data from three data sets representing different distributions. The performance of the proposed models are compared against corresponding CNN architectures as well as a typical unsupervised rPPG pipeline. The performance of the methods is evaluated based on heart rate estimations calculated from the extracted rPPG signals. In the presented experimental setup, Self-ONN models did not result in improved heart rate estimation performance over parameter-equivalent CNN alternatives. However, every Self-ONN model showed superior ability to fit the train target, which both shows promise for the applicability of Self-ONNs as well as suggests inherent problems in the training setup. Additionally, when taking into account the required computational resources in addition to raw HR estimation performance, certain Self-ONN models showcased improved efficiency over CNN alternatives. As such, the experiments nonetheless present a promising proof of concept which can serve as grounds for future research.Vertaileva arviointi itseorganisoituvien operationaalisten neuroverkkojen soveltuvuudesta etäfotopletysmografiaan. Tiivistelmä. Fotopletysmografia on laajasti sovellettu menetelmä veritilavuuspulssi-informaation saamiseksi kohteista, jota voidaan käyttää useiden fysiologisten arvojen, kuten sydämensykkeen ja hengityksen, seurannassa. Seuraten havainnoista, että veritilavuusinformaatiota on mahdollista palauttaa myös ihmiskasvoista kuvatuista videoista, useita menetelmiä fotopletysmografiasignaalien erottamiseksi etänä on esitetty kirjallisuudessa. Yhteisnimitys näille menetelmille on etäfotopletysmografia (remote photoplethysmography, rPPG). Syvät konvolutionaaliset neuroverkkomallit (convolutional neural networks, CNNs), joita on onnistuneesti sovellettu laajaan valikoimaan tietokonenäön tehtäviä, edustavat nykyistä rPPG-lähestymistapojen huippua. Uusi teknologia nimeltään operationaaliset neuroverkot (operational neural networks, ONNs) on hiljattain esitetty kirjallisuudessa konvolutionaalisten neuroverkkojen laajennukseksi. ONN:t pyrkivät eroon tavanomaisten CNN-mallien rajoitteista, jotka johtuvat pääasiassa lineaarisen neuronimallin yksinomaisesta käytöstä. Lisäksi tietyistä ONN-mallien heikkouksista eroon pääsemiseksi, teknologia nimeltään itseorganisoituvat operationaaliset neuroverkot (self-organized operational neural networks, Self-ONNs) on hiljattain esitetty lajeennuksena ONN:ille. Tämä tutkielma esittelee uudenlaisen menetelmän rPPG-erotukselle pohjautuen itseorganisoituviin operationaalisiin neuroverkkoihin. Self-ONN:ien soveltuvuuden rPPG-erotukseen perusteelliseksi arvioimiseksi kolme Self-ONN -mallia vaihtelevalla määrällä kerroksia toteutetaan ja arvioidaan testidatalla kolmesta eri datajoukosta, jotka edustavat eri jakaumia. Esitettyjen mallien suorituskykyä verrataan vastaaviin CNN-arkkitehtuureihin sekä tyypilliseen ohjaamattomaan rPPG-liukuhihnaan. Menetelmien suorituskykyä arvioidaan perustuen rPPG-signaaleista laskettuihin sydämensykearvioihin. Esitellyssä kokeellisessa asetelmassa Self-ONN:t eivät johtaneet parempiin sykearvioihin verrattuna parametrivastaaviin CNN-vaihtoehtoihin. Self-ONN:t kuitenkin osoittivat ylivertaista kykyä sovittaa opetuskohteen, mikä sekä on lupaavaa Self-ONN:ien soveltuvuuden kannalta että viittaa luontaisiin ongelmiin opetusasetelmassa. Lisäksi, kun huomioon otetaan vaaditut laskentaresurssit raa’an sykkeen arvioinnin suorituskyvyn lisäksi, tietyt Self-ONN -mallit osoittivat parempaa tehokkuutta CNN-vaihtoehtoihin verrattuna. Näin ollen kokeet joka tapauksessa tarjoavat lupaavan konseptitodistuksen, joka voi toimia perustana tulevalle tutkimukselle

    Face liveness detection by rPPG features and contextual patch-based CNN

    Get PDF
    Abstract. Face anti-spoofing plays a vital role in security systems including face payment systems and face recognition systems. Previous studies showed that live faces and presentation attacks have significant differences in both remote photoplethysmography (rPPG) and texture information. We propose a generalized method exploiting both rPPG and texture features for face anti-spoofing task. First, we design multi-scale long-term statistical spectral (MS-LTSS) features with variant granularities for the representation of rPPG information. Second, a contextual patch-based convolutional neural network (CP-CNN) is used for extracting global-local and multi-level deep texture features simultaneously. Finally, weight summation strategy is employed for decision level fusion of the two types of features, which allow the proposed system to be generalized for detecting not only print attack and replay attack, but also mask attack. Comprehensive experiments were conducted on five databases, namely 3DMAD, HKBU-Mars V1, MSU-MFSD, CASIA-FASD, and OULU-NPU, to show the superior results of the proposed method compared with state-of-the-art methods.Tiivistelmä. Kasvojen anti-spoofingilla on keskeinen rooli turvajärjestelmissä, mukaan lukien kasvojen maksujärjestelmät ja kasvojentunnistusjärjestelmät. Aiemmat tutkimukset osoittivat, että elävillä kasvoilla ja esityshyökkäyksillä on merkittäviä eroja sekä etävalopölymografiassa (rPPG) että tekstuuri-informaatiossa, ehdotamme yleistettyä menetelmää, jossa hyödynnetään sekä rPPG: tä että tekstuuriominaisuuksia kasvojen anti-spoofing -tehtävässä. Ensinnäkin rPPG-informaation esittämiseksi on suunniteltu monivaiheisia pitkän aikavälin tilastollisia spektrisiä (MS-LTSS) ominaisuuksia, joissa on muunneltavissa olevat granulariteetit. Toiseksi, kontekstuaalista patch-pohjaista konvoluutioverkkoa (CP-CNN) käytetään globaalin paikallisen ja monitasoisen syvään tekstuuriominaisuuksiin samanaikaisesti. Lopuksi, painoarvostusstrategiaa käytetään päätöksentekotason fuusioon, joka auttaa yleistämään menetelmää paitsi hyökkäys- ja toistoiskuille, mutta myös peittää hyökkäyksen. Kattavat kokeet suoritettiin viidellä tietokannalla, nimittäin 3DMAD, HKBU-Mars V1, MSU-MFSD, CASIA-FASD ja OULU-NPU, ehdotetun menetelmän parempien tulosten osoittamiseksi verrattuna uusimpiin menetelmiin

    Contrast-Phys+: Unsupervised and Weakly-supervised Video-based Remote Physiological Measurement via Spatiotemporal Contrast

    Full text link
    Video-based remote physiological measurement utilizes facial videos to measure the blood volume change signal, which is also called remote photoplethysmography (rPPG). Supervised methods for rPPG measurements have been shown to achieve good performance. However, the drawback of these methods is that they require facial videos with ground truth (GT) physiological signals, which are often costly and difficult to obtain. In this paper, we propose Contrast-Phys+, a method that can be trained in both unsupervised and weakly-supervised settings. We employ a 3DCNN model to generate multiple spatiotemporal rPPG signals and incorporate prior knowledge of rPPG into a contrastive loss function. We further incorporate the GT signals into contrastive learning to adapt to partial or misaligned labels. The contrastive loss encourages rPPG/GT signals from the same video to be grouped together, while pushing those from different videos apart. We evaluate our methods on five publicly available datasets that include both RGB and Near-infrared videos. Contrast-Phys+ outperforms the state-of-the-art supervised methods, even when using partially available or misaligned GT signals, or no labels at all. Additionally, we highlight the advantages of our methods in terms of computational efficiency, noise robustness, and generalization

    Facial Video-based Remote Physiological Measurement via Self-supervised Learning

    Full text link
    Facial video-based remote physiological measurement aims to estimate remote photoplethysmography (rPPG) signals from human face videos and then measure multiple vital signs (e.g. heart rate, respiration frequency) from rPPG signals. Recent approaches achieve it by training deep neural networks, which normally require abundant facial videos and synchronously recorded photoplethysmography (PPG) signals for supervision. However, the collection of these annotated corpora is not easy in practice. In this paper, we introduce a novel frequency-inspired self-supervised framework that learns to estimate rPPG signals from facial videos without the need of ground truth PPG signals. Given a video sample, we first augment it into multiple positive/negative samples which contain similar/dissimilar signal frequencies to the original one. Specifically, positive samples are generated using spatial augmentation. Negative samples are generated via a learnable frequency augmentation module, which performs non-linear signal frequency transformation on the input without excessively changing its visual appearance. Next, we introduce a local rPPG expert aggregation module to estimate rPPG signals from augmented samples. It encodes complementary pulsation information from different face regions and aggregate them into one rPPG prediction. Finally, we propose a series of frequency-inspired losses, i.e. frequency contrastive loss, frequency ratio consistency loss, and cross-video frequency agreement loss, for the optimization of estimated rPPG signals from multiple augmented video samples and across temporally neighboring video samples. We conduct rPPG-based heart rate, heart rate variability and respiration frequency estimation on four standard benchmarks. The experimental results demonstrate that our method improves the state of the art by a large margin.Comment: IEEE Transactions on Pattern Analysis and Machine Intelligenc

    Widefield Computational Biophotonic Imaging for Spatiotemporal Cardiovascular Hemodynamic Monitoring

    Get PDF
    Cardiovascular disease is the leading cause of mortality, resulting in 17.3 million deaths per year globally. Although cardiovascular disease accounts for approximately 30% of deaths in the United States, many deleterious events can be mitigated or prevented if detected and treated early. Indeed, early intervention and healthier behaviour adoption can reduce the relative risk of first heart attacks by up to 80% compared to those who do not adopt new healthy behaviours. Cardiovascular monitoring is a vital component of disease detection, mitigation, and treatment. The cardiovascular system is an incredibly dynamic system that constantly adapts to internal and external stimuli. Monitoring cardiovascular function and response is vital for disease detection and monitoring. Biophotonic technologies provide unique solutions for cardiovascular assessment and monitoring in naturalistic and clinical settings. These technologies leverage the properties of light as it enters and interacts with the tissue, providing safe and rapid sensing that can be performed in many different environments. Light entering into human tissue undergoes a complex series of absorption and scattering events according to both the illumination and tissue properties. The field of quantitative biomedical optics seeks to quantify physiological processes by analysing the remitted light characteristics relative to the controlled illumination source. Drawing inspiration from contact-based biophotonic sensing technologies such as pulse oximetry and near infrared spectroscopy, we explored the feasibility of widefield hemodynamic assessment using computational biophotonic imaging. Specifically, we investigated the hypothesis that computational biophotonic imaging can assess spatial and temporal properties of pulsatile blood flow across large tissue regions. This thesis presents the design, development, and evaluation of a novel photoplethysmographic imaging system for assessing spatial and temporal hemodynamics in major pulsatile vasculature through the sensing and processing of subtle light intensity fluctuations arising from local changes in blood volume. This system co-integrates methods from biomedical optics, electronic control, and biomedical image and signal processing to enable non-contact widefield hemodynamic assessment over large tissue regions. A biophotonic optical model was developed to quantitatively assess transient blood volume changes in a manner that does not require a priori information about the tissue's absorption and scattering characteristics. A novel automatic blood pulse waveform extraction method was developed to encourage passive monitoring. This spectral-spatial pixel fusion method uses physiological hemodynamic priors to guide a probabilistic framework for learning pixel weights across the scene. Pixels are combined according to their signal weight, resulting in a single waveform. Widefield hemodynamic imaging was assessed in three biomedical applications using the aforementioned developed system. First, spatial vascular distribution was investigated across a sample with highly varying demographics for assessing common pulsatile vascular pathways. Second, non-contact biophotonic assessment of the jugular venous pulse waveform was assessed, demonstrating clinically important information about cardiac contractility function in a manner which is currently assessed through invasive catheterization. Lastly, non-contact biophotonic assessment of cardiac arrhythmia was demonstrated, leveraging the system's ability to extract strong hemodynamic signals for assessing subtle fluctuations in the waveform. This research demonstrates that this novel approach for computational biophotonic hemodynamic imaging offers new cardiovascular monitoring and assessment techniques, which can enable new scientific discoveries and clinical detection related to cardiovascular function

    Blind Source Separation for the Processing of Contact-Less Biosignals

    Get PDF
    (Spatio-temporale) Blind Source Separation (BSS) eignet sich für die Verarbeitung von Multikanal-Messungen im Bereich der kontaktlosen Biosignalerfassung. Ziel der BSS ist dabei die Trennung von (z.B. kardialen) Nutzsignalen und Störsignalen typisch für die kontaktlosen Messtechniken. Das Potential der BSS kann praktisch nur ausgeschöpft werden, wenn (1) ein geeignetes BSS-Modell verwendet wird, welches der Komplexität der Multikanal-Messung gerecht wird und (2) die unbestimmte Permutation unter den BSS-Ausgangssignalen gelöst wird, d.h. das Nutzsignal praktisch automatisiert identifiziert werden kann. Die vorliegende Arbeit entwirft ein Framework, mit dessen Hilfe die Effizienz von BSS-Algorithmen im Kontext des kamera-basierten Photoplethysmogramms bewertet werden kann. Empfehlungen zur Auswahl bestimmter Algorithmen im Zusammenhang mit spezifischen Signal-Charakteristiken werden abgeleitet. Außerdem werden im Rahmen der Arbeit Konzepte für die automatisierte Kanalauswahl nach BSS im Bereich der kontaktlosen Messung des Elektrokardiogramms entwickelt und bewertet. Neuartige Algorithmen basierend auf Sparse Coding erwiesen sich dabei als besonders effizient im Vergleich zu Standard-Methoden.(Spatio-temporal) Blind Source Separation (BSS) provides a large potential to process distorted multichannel biosignal measurements in the context of novel contact-less recording techniques for separating distortions from the cardiac signal of interest. This potential can only be practically utilized (1) if a BSS model is applied that matches the complexity of the measurement, i.e. the signal mixture and (2) if permutation indeterminacy is solved among the BSS output components, i.e the component of interest can be practically selected. The present work, first, designs a framework to assess the efficacy of BSS algorithms in the context of the camera-based photoplethysmogram (cbPPG) and characterizes multiple BSS algorithms, accordingly. Algorithm selection recommendations for certain mixture characteristics are derived. Second, the present work develops and evaluates concepts to solve permutation indeterminacy for BSS outputs of contact-less electrocardiogram (ECG) recordings. The novel approach based on sparse coding is shown to outperform the existing concepts of higher order moments and frequency-domain features

    State of the art of audio- and video based solutions for AAL

    Get PDF
    Working Group 3. Audio- and Video-based AAL ApplicationsIt is a matter of fact that Europe is facing more and more crucial challenges regarding health and social care due to the demographic change and the current economic context. The recent COVID-19 pandemic has stressed this situation even further, thus highlighting the need for taking action. Active and Assisted Living (AAL) technologies come as a viable approach to help facing these challenges, thanks to the high potential they have in enabling remote care and support. Broadly speaking, AAL can be referred to as the use of innovative and advanced Information and Communication Technologies to create supportive, inclusive and empowering applications and environments that enable older, impaired or frail people to live independently and stay active longer in society. AAL capitalizes on the growing pervasiveness and effectiveness of sensing and computing facilities to supply the persons in need with smart assistance, by responding to their necessities of autonomy, independence, comfort, security and safety. The application scenarios addressed by AAL are complex, due to the inherent heterogeneity of the end-user population, their living arrangements, and their physical conditions or impairment. Despite aiming at diverse goals, AAL systems should share some common characteristics. They are designed to provide support in daily life in an invisible, unobtrusive and user-friendly manner. Moreover, they are conceived to be intelligent, to be able to learn and adapt to the requirements and requests of the assisted people, and to synchronise with their specific needs. Nevertheless, to ensure the uptake of AAL in society, potential users must be willing to use AAL applications and to integrate them in their daily environments and lives. In this respect, video- and audio-based AAL applications have several advantages, in terms of unobtrusiveness and information richness. Indeed, cameras and microphones are far less obtrusive with respect to the hindrance other wearable sensors may cause to one’s activities. In addition, a single camera placed in a room can record most of the activities performed in the room, thus replacing many other non-visual sensors. Currently, video-based applications are effective in recognising and monitoring the activities, the movements, and the overall conditions of the assisted individuals as well as to assess their vital parameters (e.g., heart rate, respiratory rate). Similarly, audio sensors have the potential to become one of the most important modalities for interaction with AAL systems, as they can have a large range of sensing, do not require physical presence at a particular location and are physically intangible. Moreover, relevant information about individuals’ activities and health status can derive from processing audio signals (e.g., speech recordings). Nevertheless, as the other side of the coin, cameras and microphones are often perceived as the most intrusive technologies from the viewpoint of the privacy of the monitored individuals. This is due to the richness of the information these technologies convey and the intimate setting where they may be deployed. Solutions able to ensure privacy preservation by context and by design, as well as to ensure high legal and ethical standards are in high demand. After the review of the current state of play and the discussion in GoodBrother, we may claim that the first solutions in this direction are starting to appear in the literature. A multidisciplinary 4 debate among experts and stakeholders is paving the way towards AAL ensuring ergonomics, usability, acceptance and privacy preservation. The DIANA, PAAL, and VisuAAL projects are examples of this fresh approach. This report provides the reader with a review of the most recent advances in audio- and video-based monitoring technologies for AAL. It has been drafted as a collective effort of WG3 to supply an introduction to AAL, its evolution over time and its main functional and technological underpinnings. In this respect, the report contributes to the field with the outline of a new generation of ethical-aware AAL technologies and a proposal for a novel comprehensive taxonomy of AAL systems and applications. Moreover, the report allows non-technical readers to gather an overview of the main components of an AAL system and how these function and interact with the end-users. The report illustrates the state of the art of the most successful AAL applications and functions based on audio and video data, namely (i) lifelogging and self-monitoring, (ii) remote monitoring of vital signs, (iii) emotional state recognition, (iv) food intake monitoring, activity and behaviour recognition, (v) activity and personal assistance, (vi) gesture recognition, (vii) fall detection and prevention, (viii) mobility assessment and frailty recognition, and (ix) cognitive and motor rehabilitation. For these application scenarios, the report illustrates the state of play in terms of scientific advances, available products and research project. The open challenges are also highlighted. The report ends with an overview of the challenges, the hindrances and the opportunities posed by the uptake in real world settings of AAL technologies. In this respect, the report illustrates the current procedural and technological approaches to cope with acceptability, usability and trust in the AAL technology, by surveying strategies and approaches to co-design, to privacy preservation in video and audio data, to transparency and explainability in data processing, and to data transmission and communication. User acceptance and ethical considerations are also debated. Finally, the potentials coming from the silver economy are overviewed.publishedVersio

    State of the Art of Audio- and Video-Based Solutions for AAL

    Get PDF
    It is a matter of fact that Europe is facing more and more crucial challenges regarding health and social care due to the demographic change and the current economic context. The recent COVID-19 pandemic has stressed this situation even further, thus highlighting the need for taking action. Active and Assisted Living technologies come as a viable approach to help facing these challenges, thanks to the high potential they have in enabling remote care and support. Broadly speaking, AAL can be referred to as the use of innovative and advanced Information and Communication Technologies to create supportive, inclusive and empowering applications and environments that enable older, impaired or frail people to live independently and stay active longer in society. AAL capitalizes on the growing pervasiveness and effectiveness of sensing and computing facilities to supply the persons in need with smart assistance, by responding to their necessities of autonomy, independence, comfort, security and safety. The application scenarios addressed by AAL are complex, due to the inherent heterogeneity of the end-user population, their living arrangements, and their physical conditions or impairment. Despite aiming at diverse goals, AAL systems should share some common characteristics. They are designed to provide support in daily life in an invisible, unobtrusive and user-friendly manner. Moreover, they are conceived to be intelligent, to be able to learn and adapt to the requirements and requests of the assisted people, and to synchronise with their specific needs. Nevertheless, to ensure the uptake of AAL in society, potential users must be willing to use AAL applications and to integrate them in their daily environments and lives. In this respect, video- and audio-based AAL applications have several advantages, in terms of unobtrusiveness and information richness. Indeed, cameras and microphones are far less obtrusive with respect to the hindrance other wearable sensors may cause to one’s activities. In addition, a single camera placed in a room can record most of the activities performed in the room, thus replacing many other non-visual sensors. Currently, video-based applications are effective in recognising and monitoring the activities, the movements, and the overall conditions of the assisted individuals as well as to assess their vital parameters. Similarly, audio sensors have the potential to become one of the most important modalities for interaction with AAL systems, as they can have a large range of sensing, do not require physical presence at a particular location and are physically intangible. Moreover, relevant information about individuals’ activities and health status can derive from processing audio signals. Nevertheless, as the other side of the coin, cameras and microphones are often perceived as the most intrusive technologies from the viewpoint of the privacy of the monitored individuals. This is due to the richness of the information these technologies convey and the intimate setting where they may be deployed. Solutions able to ensure privacy preservation by context and by design, as well as to ensure high legal and ethical standards are in high demand. After the review of the current state of play and the discussion in GoodBrother, we may claim that the first solutions in this direction are starting to appear in the literature. A multidisciplinary debate among experts and stakeholders is paving the way towards AAL ensuring ergonomics, usability, acceptance and privacy preservation. The DIANA, PAAL, and VisuAAL projects are examples of this fresh approach. This report provides the reader with a review of the most recent advances in audio- and video-based monitoring technologies for AAL. It has been drafted as a collective effort of WG3 to supply an introduction to AAL, its evolution over time and its main functional and technological underpinnings. In this respect, the report contributes to the field with the outline of a new generation of ethical-aware AAL technologies and a proposal for a novel comprehensive taxonomy of AAL systems and applications. Moreover, the report allows non-technical readers to gather an overview of the main components of an AAL system and how these function and interact with the end-users. The report illustrates the state of the art of the most successful AAL applications and functions based on audio and video data, namely lifelogging and self-monitoring, remote monitoring of vital signs, emotional state recognition, food intake monitoring, activity and behaviour recognition, activity and personal assistance, gesture recognition, fall detection and prevention, mobility assessment and frailty recognition, and cognitive and motor rehabilitation. For these application scenarios, the report illustrates the state of play in terms of scientific advances, available products and research project. The open challenges are also highlighted. The report ends with an overview of the challenges, the hindrances and the opportunities posed by the uptake in real world settings of AAL technologies. In this respect, the report illustrates the current procedural and technological approaches to cope with acceptability, usability and trust in the AAL technology, by surveying strategies and approaches to co-design, to privacy preservation in video and audio data, to transparency and explainability in data processing, and to data transmission and communication. User acceptance and ethical considerations are also debated. Finally, the potentials coming from the silver economy are overviewed
    corecore