Search CORE

68 research outputs found

Markerless facial motion capture: deep learning approaches on RGBD data

Author: Kendrick Connah
Publication venue
Publication date: 25/11/2018
Field of study

Facial expressions are a series of fast, complex and interconnected movement that causes an array of deformations, such as stretching, compressing and folding of the skin. Identifying expression is a natural process in human vision, but due to the diversity of faces, it has many challenges for computer vision. Research in markerless facial motion capture using single Red Green Blue (RGB) camera has gained popularity due to the wide access of the data, such as from mobile phones. The motivation behind this work is much of the existing work attempts to infer the 3-Dimensional (3D) data from 2-Dimensional (2D) images, such as in motion capture multiple 2D cameras are calibration to allow some depth prediction. Whereas, the inclusion of Red Green Blue Depth (RGBD) sensors that give ground truth depth data could gain a better understanding of the human face and how expressions are visualised. The aim of this thesis is to investigate and develop novel methods of markerless facial motion capture, where the focus is on the inclusions of RGBD data to provide 3D data. The contributions are: A tool to aid in the annotation of 3D facial landmarks; A novel neural network that demonstrate the ability of predicting 2D and 3D landmarks by merging RGBD data; Working application that demonstrates complex deep learning network on portable handheld devices; A review of existing methods of denoising fine detail in depth maps using neural networks; A network for the complete analysis of facial landmarks and expressions in 3D. The 3D annotator was developed to overcome the issues of relying on existing 3D modelling software, which made feature identification difficult. The technique of predicting 2D and 3D with auxiliary information, allowed high accuracy 3D landmarking, without the need for full model generation. Also, it outperformed other recent techniques of landmarking. The networks running on the handheld devices show as a proof of concept that even without much optimisation, a complex task can be performed in near real-time. Denoising Time of Flight (ToF) depth maps, showed much more complexity than the tradition RGB denoising, where we reviewed and applied an array of techniques to the task. The full facial analysis showed that when neural networks perform on a wide range of related task for auxiliary information allow for deep understanding of the overall task. The research for facial processing is vast, but still with many new problems and challenges to face and improve upon. While RGB cameras are used widely, we see the inclusion of high accuracy and cost-effective depth sensing device available. The new devices allow better understanding of facial features and expression. By using and merging RGB data, the area of facial landmarking, and expression intensity recognition can be improved

E-space: Manchester Metropolitan University's Research Repository

Facial performance capture from visual input and EMG signals

Author: Lou Jianwen
Publication venue
Publication date: 01/09/2020
Field of study

Portsmouth University Research Portal (Pure)

Recommended from our members

Gaze Estimation with Graphics

Author: Wood Erroll William
Publication venue: University of Cambridge
Publication date: 21/10/2017
Field of study

Gaze estimation systems determine where someone is looking. Gaze is used for a wide range of applications including market research, usability studies, and gaze-based interfaces. Traditional equipment uses special hardware. To bring gaze estimation mainstream, researchers are exploring approaches that use commodity hardware alone. My work addresses two outstanding problems in this field: 1) it is hard to collect good ground truth eye images for machine learning, and 2) gaze estimation systems do not generalize well -- once they are trained with images from one scenario, they do not work in another scenario. In this dissertation I address these problems in two different ways: learning-by-synthesis and analysis-by-synthesis. Learning-by-synthesis is the process of training a machine learning system with synthetic data, i.e. data that has been rendered with graphics rather than collected by hand. Analysis-by-synthesis is a computer vision strategy that couples a generative model of image formation (synthesis) with a perceptive model of scene comparison (analysis). The goal is to synthesize an image that best matches an observed image. In this dissertation I present three main contributions. First, I present a new method for training gaze estimation systems that use machine learning: learning-by-synthesis using 3D head scans and photorealistic rendering. Second, I present a new morphable model of the eye region. I show how this model can be used to generate large amounts of varied data for learning-by-synthesis. Third, I present a new method for gaze estimation: analysis-by-synthesis. I demonstrate how analysis-by-synthesis can generalize to different scenarios, estimating gaze in a device- and person- independent manner.EPSRC Doctoral Training Grant studentship for Erroll Wood (RG71269

Apollo (Cambridge)

Face Liveness Detection under Processed Image Attacks

Author: OMAR LUMA,QASSAM,ABEDALQADER
Publication venue
Publication date: 01/01/2018
Field of study

Face recognition is a mature and reliable technology for identifying people. Due to high-deﬁnition cameras and supporting devices, it is considered the fastest and the least intrusive biometric recognition modality. Nevertheless, eﬀective spooﬁng attempts on face recognition systems were found to be possible. As a result, various anti-spooﬁng algorithms were developed to counteract these attacks. They are commonly referred in the literature a liveness detection tests. In this research we highlight the eﬀectiveness of some simple, direct spooﬁng attacks, and test one of the current robust liveness detection algorithms, i.e. the logistic regression based face liveness detection from a single image, proposed by the Tan et al. in 2010, against malicious attacks using processed imposter images. In particular, we study experimentally the eﬀect of common image processing operations such as sharpening and smoothing, as well as corruption with salt and pepper noise, on the face liveness detection algorithm, and we ﬁnd that it is especially vulnerable against spooﬁng attempts using processed imposter images. We design and present a new facial database, the Durham Face Database, which is the ﬁrst, to the best of our knowledge, to have client, imposter as well as processed imposter images. Finally, we evaluate our claim on the eﬀectiveness of proposed imposter image attacks using transfer learning on Convolutional Neural Networks. We verify that such attacks are more diﬃcult to detect even when using high-end, expensive machine learning techniques

Durham e-Theses

Advancing heat stress detection in dairy cows through machine learning and computer vision

Author: Shu Hang
Publication venue: ULiège - University of Liège
Publication date: 22/04/2024
Field of study

Heat stress detection in dairy cows has long been connected with production loss. However, the reduction in milk yield lags behind the exposure to heat stress events for about two days. Other stress responses, such as physiological and behavioural changes, are well documented to be activated by dairy cows in the earlier stage of heat stress compared with production loss. Among all candidate indicators, body surface temperatures (BST), respiration rate (RR), and relevant behaviours have been concluded to be the most appropriate indicators due to their high feasibility of acquisition and early response. Vision-based methods are promising for accurate measurements while adhering to animal welfare principles. Meanwhile, predictive models show a non-invasive alternative to obtain these data and can provide useful insights with their interpretations. Thus, this thesis aimed to provide non-invasive solutions to the detection of heat stress in dairy cows by using artificial intelligence techniques. The detailed research content and relevant conclusions are as follows: An automated tool based on improved UNet was proposed to collect facial BST from five facial landmarks (i.e., eyes, muzzle, nostrils, ears, and horns) on cattle infrared images. The baseline UNet model was improved by replacing the traditional convolutional layers in the decoder with Ghost modules and adding efficient channel attention modules. The improved UNet outperformed other comparable models with the highest mean Intersection of Union of 80.76% and a slightly slower but still good inference speed of 32.7 frames per second (FPS). Agreement analysis reveals small to negligible differences between the temperatures obtained automatically in the area of eyes and ears and the ground truth. A vision-based method was proposed to measure RR for multiple dairy cows lying on free stalls. The proposed method involved various computer vision tasks (i.e., instance segmentation, object detection, object tracking, video stabilisation, and optical flow) to obtain respiration-related signals and finally utilised Fast Fourier Transform to extract RR. The results show that the measured RR had a Pearson correlation coefficient of 0.945, a root mean square error (RMSE) of 5.24 breaths per minute (bpm), and an intraclass correlation coefficient of 0.98 compared with visual observation. The average processing time and FPS on 55 test video clips (mean ± standard deviation duration of 16 ± 4 s) was 8.2 s and 64, respectively. A deep learning-based model was proposed to recognise cow behaviours (i.e., drinking, eating, lying, standing-in, and standing-out) that are known to be influenced by heat stress. The YOLOv5s model was selected due to its ability to compress the weight size while maintaining accuracy. It had a mean average precision of 0.985 and an inference speed of 73 FPS. Further validation demonstrates the excellent capacity of the proposed model in measuring herd-level behavioural indicators, with an intraclass correlation coefficient of 0.97 compared with manual observation. Critical thresholds were determined by using piecewise regression models with environmental indicators as the predictors and animal-based indicators as the outcomes. An ambient temperature (Ta) threshold was determined at 26.1 °C when the automated measured mean eye temperature reached 35.3 °C. A Ta threshold of 23.6 °C and a temperature-humidity index (THI) threshold of 72 were determined when the automated measured RR reached 61.1 and 60.4 bpm, respectively. In addition, the test dairy herd began to change their standing and lying behaviour at the earliest Ta of 23.8 ℃ or THI of 68.5. Four machine learning algorithms were used to predict RR, vaginal temperature (VT), and eye temperature (ET) from 13 predictor variables from three dimensions: production, cow-related, and environmental factors. The artificial neural networks yielded the lowest RMSE for predicting RR (13.24 bpm), VT (0.30 ℃), and ET (0.29 ℃). The results interpreted with partial dependence plots and Local Interpretable Model-agnostic Explanations show that P.M. measurements and winter calving contributed most to high RR and VT predictions, whereas lying posture, high Ta, and low wind speed contributed most to high ET predictions. Based on these results, an integrative application of all the proposed measurement, prediction, and assessment methods has been suggested, wherein RGB and infrared cameras are used to measure animal-based indicators, and critical thresholds, along with model interpretation, are used to assess the heat stress state of dairy cows. This strategy ensures timely and thorough cooling of cows in all areas of the dairy farm, thereby minimising the negative impact of heat stress to the greatest extent

Open Repository and Bibliography - Liège

Automatic 3D Facial Performance Acquisition and Animation using Monocular Videos

Author: Shi Fuhao
Publication venue
Publication date: 21/08/2017
Field of study

Facial performance capture and animation is an essential component of many applications such as movies, video games, and virtual environments. Video-based facial performance capture is particularly appealing as it offers the lowest cost and the potential use of legacy sources and uncontrolled videos. However, it is also challenging because of complex facial movements at different scales, ambiguity caused by the loss of depth information, and a lack of discernible features on most facial regions. Unknown lighting conditions and camera parameters further complicate the problem. This dissertation explores the video-based 3D facial performance capture systems that use a single video camera, overcome the challenges aforementioned, and produce accurate and robust reconstruction results. We first develop a novel automatic facial feature detection/tracking algorithm that accurately locates important facial features across the entire video sequence, which are then used for 3D pose and facial shape reconstruction. The key idea is to combine the respective powers of local detection, spatial priors for facial feature locations, Active Appearance Models (AAMs), and temporal coherence for facial feature detection. The algorithm runs in realtime and is robust to large pose and expression variations and occlusions. We then present an automatic high-fidelity facial performance capture system that works on monocular videos. It uses the detected facial features along with multilinear facial models to reconstruct 3D head poses and large-scale facial deformation, and uses per-pixel shading cues to add fine-scale surface details such as emerging or disappearing wrinkles and folds. We iterate the reconstruction procedure on large-scale facial geometry and fine-scale facial details to improve the accuracy of facial reconstruction. We further improve the accuracy and efficiency of the large-scale facial performance capture by introducing a local binary feature based 2D feature regression and a convolutional neural network based pose and expression regression, and complement it with an efficient 3D eye gaze tracker to achieve realtime 3D eye gaze animation. We have tested our systems on various monocular videos, demonstrating the accuracy and robustness under a variety of uncontrolled lighting conditions and overcoming significant shape differences across individuals

Texas A&M Repository

Ubiquitous Technologies for Emotion Recognition

Author
Publication venue: 'MDPI AG'
Publication date: 11/01/2022
Field of study

Emotions play a very important role in how we think and behave. As such, the emotions we feel every day can compel us to act and influence the decisions and plans we make about our lives. Being able to measure, analyze, and better comprehend how or why our emotions may change is thus of much relevance to understand human behavior and its consequences. Despite the great efforts made in the past in the study of human emotions, it is only now, with the advent of wearable, mobile, and ubiquitous technologies, that we can aim to sense and recognize emotions, continuously and in real time. This book brings together the latest experiences, findings, and developments regarding ubiquitous sensing, modeling, and the recognition of human emotions

Directory of Open Access Books (DOAB)

Intelligent Sensors for Human Motion Analysis

Author
Publication venue: 'MDPI AG'
Publication date: 25/10/2022
Field of study

The book, "Intelligent Sensors for Human Motion Analysis," contains 17 articles published in the Special Issue of the Sensors journal. These articles deal with many aspects related to the analysis of human movement. New techniques and methods for pose estimation, gait recognition, and fall detection have been proposed and verified. Some of them will trigger further research, and some may become the backbone of commercial systems

Directory of Open Access Books (DOAB)

Advanced Biometrics with Deep Learning

Author
Publication venue: 'MDPI AG'
Publication date: 01/05/2021
Field of study

Biometrics, such as fingerprint, iris, face, hand print, hand vein, speech and gait recognition, etc., as a means of identity management have become commonplace nowadays for various applications. Biometric systems follow a typical pipeline, that is composed of separate preprocessing, feature extraction and classification. Deep learning as a data-driven representation learning approach has been shown to be a promising alternative to conventional data-agnostic and handcrafted pre-processing and feature extraction for biometric systems. Furthermore, deep learning offers an end-to-end learning paradigm to unify preprocessing, feature extraction, and recognition, based solely on biometric data. This Special Issue has collected 12 high-quality, state-of-the-art research papers that deal with challenging issues in advanced biometric systems based on deep learning. The 12 papers can be divided into 4 categories according to biometric modality; namely, face biometrics, medical electronic signals (EEG and ECG), voice print, and others

Directory of Open Access Books (DOAB)

Emotion and Stress Recognition Related Sensors and Machine Learning Technologies

Author
Publication venue: 'MDPI AG'
Publication date: 11/01/2022
Field of study

This book includes impactful chapters which present scientific concepts, frameworks, architectures and ideas on sensing technologies and machine learning techniques. These are relevant in tackling the following challenges: (i) the field readiness and use of intrusive sensor systems and devices for capturing biosignals, including EEG sensor systems, ECG sensor systems and electrodermal activity sensor systems; (ii) the quality assessment and management of sensor data; (iii) data preprocessing, noise filtering and calibration concepts for biosignals; (iv) the field readiness and use of nonintrusive sensor technologies, including visual sensors, acoustic sensors, vibration sensors and piezoelectric sensors; (v) emotion recognition using mobile phones and smartwatches; (vi) body area sensor networks for emotion and stress studies; (vii) the use of experimental datasets in emotion recognition, including dataset generation principles and concepts, quality insurance and emotion elicitation material and concepts; (viii) machine learning techniques for robust emotion recognition, including graphical models, neural network methods, deep learning methods, statistical learning and multivariate empirical mode decomposition; (ix) subject-independent emotion and stress recognition concepts and systems, including facial expression-based systems, speech-based systems, EEG-based systems, ECG-based systems, electrodermal activity-based systems, multimodal recognition systems and sensor fusion concepts and (x) emotion and stress estimation and forecasting from a nonlinear dynamical system perspective

Directory of Open Access Books (DOAB)