68 research outputs found

    Markerless facial motion capture: deep learning approaches on RGBD data

    Get PDF
    Facial expressions are a series of fast, complex and interconnected movement that causes an array of deformations, such as stretching, compressing and folding of the skin. Identifying expression is a natural process in human vision, but due to the diversity of faces, it has many challenges for computer vision. Research in markerless facial motion capture using single Red Green Blue (RGB) camera has gained popularity due to the wide access of the data, such as from mobile phones. The motivation behind this work is much of the existing work attempts to infer the 3-Dimensional (3D) data from 2-Dimensional (2D) images, such as in motion capture multiple 2D cameras are calibration to allow some depth prediction. Whereas, the inclusion of Red Green Blue Depth (RGBD) sensors that give ground truth depth data could gain a better understanding of the human face and how expressions are visualised. The aim of this thesis is to investigate and develop novel methods of markerless facial motion capture, where the focus is on the inclusions of RGBD data to provide 3D data. The contributions are: A tool to aid in the annotation of 3D facial landmarks; A novel neural network that demonstrate the ability of predicting 2D and 3D landmarks by merging RGBD data; Working application that demonstrates complex deep learning network on portable handheld devices; A review of existing methods of denoising fine detail in depth maps using neural networks; A network for the complete analysis of facial landmarks and expressions in 3D. The 3D annotator was developed to overcome the issues of relying on existing 3D modelling software, which made feature identification difficult. The technique of predicting 2D and 3D with auxiliary information, allowed high accuracy 3D landmarking, without the need for full model generation. Also, it outperformed other recent techniques of landmarking. The networks running on the handheld devices show as a proof of concept that even without much optimisation, a complex task can be performed in near real-time. Denoising Time of Flight (ToF) depth maps, showed much more complexity than the tradition RGB denoising, where we reviewed and applied an array of techniques to the task. The full facial analysis showed that when neural networks perform on a wide range of related task for auxiliary information allow for deep understanding of the overall task. The research for facial processing is vast, but still with many new problems and challenges to face and improve upon. While RGB cameras are used widely, we see the inclusion of high accuracy and cost-effective depth sensing device available. The new devices allow better understanding of facial features and expression. By using and merging RGB data, the area of facial landmarking, and expression intensity recognition can be improved

    Face Liveness Detection under Processed Image Attacks

    Get PDF
    Face recognition is a mature and reliable technology for identifying people. Due to high-definition cameras and supporting devices, it is considered the fastest and the least intrusive biometric recognition modality. Nevertheless, effective spoofing attempts on face recognition systems were found to be possible. As a result, various anti-spoofing algorithms were developed to counteract these attacks. They are commonly referred in the literature a liveness detection tests. In this research we highlight the effectiveness of some simple, direct spoofing attacks, and test one of the current robust liveness detection algorithms, i.e. the logistic regression based face liveness detection from a single image, proposed by the Tan et al. in 2010, against malicious attacks using processed imposter images. In particular, we study experimentally the effect of common image processing operations such as sharpening and smoothing, as well as corruption with salt and pepper noise, on the face liveness detection algorithm, and we find that it is especially vulnerable against spoofing attempts using processed imposter images. We design and present a new facial database, the Durham Face Database, which is the first, to the best of our knowledge, to have client, imposter as well as processed imposter images. Finally, we evaluate our claim on the effectiveness of proposed imposter image attacks using transfer learning on Convolutional Neural Networks. We verify that such attacks are more difficult to detect even when using high-end, expensive machine learning techniques

    Advancing heat stress detection in dairy cows through machine learning and computer vision

    Full text link
    Heat stress detection in dairy cows has long been connected with production loss. However, the reduction in milk yield lags behind the exposure to heat stress events for about two days. Other stress responses, such as physiological and behavioural changes, are well documented to be activated by dairy cows in the earlier stage of heat stress compared with production loss. Among all candidate indicators, body surface temperatures (BST), respiration rate (RR), and relevant behaviours have been concluded to be the most appropriate indicators due to their high feasibility of acquisition and early response. Vision-based methods are promising for accurate measurements while adhering to animal welfare principles. Meanwhile, predictive models show a non-invasive alternative to obtain these data and can provide useful insights with their interpretations. Thus, this thesis aimed to provide non-invasive solutions to the detection of heat stress in dairy cows by using artificial intelligence techniques. The detailed research content and relevant conclusions are as follows: An automated tool based on improved UNet was proposed to collect facial BST from five facial landmarks (i.e., eyes, muzzle, nostrils, ears, and horns) on cattle infrared images. The baseline UNet model was improved by replacing the traditional convolutional layers in the decoder with Ghost modules and adding efficient channel attention modules. The improved UNet outperformed other comparable models with the highest mean Intersection of Union of 80.76% and a slightly slower but still good inference speed of 32.7 frames per second (FPS). Agreement analysis reveals small to negligible differences between the temperatures obtained automatically in the area of eyes and ears and the ground truth. A vision-based method was proposed to measure RR for multiple dairy cows lying on free stalls. The proposed method involved various computer vision tasks (i.e., instance segmentation, object detection, object tracking, video stabilisation, and optical flow) to obtain respiration-related signals and finally utilised Fast Fourier Transform to extract RR. The results show that the measured RR had a Pearson correlation coefficient of 0.945, a root mean square error (RMSE) of 5.24 breaths per minute (bpm), and an intraclass correlation coefficient of 0.98 compared with visual observation. The average processing time and FPS on 55 test video clips (mean ± standard deviation duration of 16 ± 4 s) was 8.2 s and 64, respectively. A deep learning-based model was proposed to recognise cow behaviours (i.e., drinking, eating, lying, standing-in, and standing-out) that are known to be influenced by heat stress. The YOLOv5s model was selected due to its ability to compress the weight size while maintaining accuracy. It had a mean average precision of 0.985 and an inference speed of 73 FPS. Further validation demonstrates the excellent capacity of the proposed model in measuring herd-level behavioural indicators, with an intraclass correlation coefficient of 0.97 compared with manual observation. Critical thresholds were determined by using piecewise regression models with environmental indicators as the predictors and animal-based indicators as the outcomes. An ambient temperature (Ta) threshold was determined at 26.1 °C when the automated measured mean eye temperature reached 35.3 °C. A Ta threshold of 23.6 °C and a temperature-humidity index (THI) threshold of 72 were determined when the automated measured RR reached 61.1 and 60.4 bpm, respectively. In addition, the test dairy herd began to change their standing and lying behaviour at the earliest Ta of 23.8 ℃ or THI of 68.5. Four machine learning algorithms were used to predict RR, vaginal temperature (VT), and eye temperature (ET) from 13 predictor variables from three dimensions: production, cow-related, and environmental factors. The artificial neural networks yielded the lowest RMSE for predicting RR (13.24 bpm), VT (0.30 ℃), and ET (0.29 ℃). The results interpreted with partial dependence plots and Local Interpretable Model-agnostic Explanations show that P.M. measurements and winter calving contributed most to high RR and VT predictions, whereas lying posture, high Ta, and low wind speed contributed most to high ET predictions. Based on these results, an integrative application of all the proposed measurement, prediction, and assessment methods has been suggested, wherein RGB and infrared cameras are used to measure animal-based indicators, and critical thresholds, along with model interpretation, are used to assess the heat stress state of dairy cows. This strategy ensures timely and thorough cooling of cows in all areas of the dairy farm, thereby minimising the negative impact of heat stress to the greatest extent

    Automatic 3D Facial Performance Acquisition and Animation using Monocular Videos

    Get PDF
    Facial performance capture and animation is an essential component of many applications such as movies, video games, and virtual environments. Video-based facial performance capture is particularly appealing as it offers the lowest cost and the potential use of legacy sources and uncontrolled videos. However, it is also challenging because of complex facial movements at different scales, ambiguity caused by the loss of depth information, and a lack of discernible features on most facial regions. Unknown lighting conditions and camera parameters further complicate the problem. This dissertation explores the video-based 3D facial performance capture systems that use a single video camera, overcome the challenges aforementioned, and produce accurate and robust reconstruction results. We first develop a novel automatic facial feature detection/tracking algorithm that accurately locates important facial features across the entire video sequence, which are then used for 3D pose and facial shape reconstruction. The key idea is to combine the respective powers of local detection, spatial priors for facial feature locations, Active Appearance Models (AAMs), and temporal coherence for facial feature detection. The algorithm runs in realtime and is robust to large pose and expression variations and occlusions. We then present an automatic high-fidelity facial performance capture system that works on monocular videos. It uses the detected facial features along with multilinear facial models to reconstruct 3D head poses and large-scale facial deformation, and uses per-pixel shading cues to add fine-scale surface details such as emerging or disappearing wrinkles and folds. We iterate the reconstruction procedure on large-scale facial geometry and fine-scale facial details to improve the accuracy of facial reconstruction. We further improve the accuracy and efficiency of the large-scale facial performance capture by introducing a local binary feature based 2D feature regression and a convolutional neural network based pose and expression regression, and complement it with an efficient 3D eye gaze tracker to achieve realtime 3D eye gaze animation. We have tested our systems on various monocular videos, demonstrating the accuracy and robustness under a variety of uncontrolled lighting conditions and overcoming significant shape differences across individuals

    Ubiquitous Technologies for Emotion Recognition

    Get PDF
    Emotions play a very important role in how we think and behave. As such, the emotions we feel every day can compel us to act and influence the decisions and plans we make about our lives. Being able to measure, analyze, and better comprehend how or why our emotions may change is thus of much relevance to understand human behavior and its consequences. Despite the great efforts made in the past in the study of human emotions, it is only now, with the advent of wearable, mobile, and ubiquitous technologies, that we can aim to sense and recognize emotions, continuously and in real time. This book brings together the latest experiences, findings, and developments regarding ubiquitous sensing, modeling, and the recognition of human emotions

    Intelligent Sensors for Human Motion Analysis

    Get PDF
    The book, "Intelligent Sensors for Human Motion Analysis," contains 17 articles published in the Special Issue of the Sensors journal. These articles deal with many aspects related to the analysis of human movement. New techniques and methods for pose estimation, gait recognition, and fall detection have been proposed and verified. Some of them will trigger further research, and some may become the backbone of commercial systems

    Advanced Biometrics with Deep Learning

    Get PDF
    Biometrics, such as fingerprint, iris, face, hand print, hand vein, speech and gait recognition, etc., as a means of identity management have become commonplace nowadays for various applications. Biometric systems follow a typical pipeline, that is composed of separate preprocessing, feature extraction and classification. Deep learning as a data-driven representation learning approach has been shown to be a promising alternative to conventional data-agnostic and handcrafted pre-processing and feature extraction for biometric systems. Furthermore, deep learning offers an end-to-end learning paradigm to unify preprocessing, feature extraction, and recognition, based solely on biometric data. This Special Issue has collected 12 high-quality, state-of-the-art research papers that deal with challenging issues in advanced biometric systems based on deep learning. The 12 papers can be divided into 4 categories according to biometric modality; namely, face biometrics, medical electronic signals (EEG and ECG), voice print, and others

    Emotion and Stress Recognition Related Sensors and Machine Learning Technologies

    Get PDF
    This book includes impactful chapters which present scientific concepts, frameworks, architectures and ideas on sensing technologies and machine learning techniques. These are relevant in tackling the following challenges: (i) the field readiness and use of intrusive sensor systems and devices for capturing biosignals, including EEG sensor systems, ECG sensor systems and electrodermal activity sensor systems; (ii) the quality assessment and management of sensor data; (iii) data preprocessing, noise filtering and calibration concepts for biosignals; (iv) the field readiness and use of nonintrusive sensor technologies, including visual sensors, acoustic sensors, vibration sensors and piezoelectric sensors; (v) emotion recognition using mobile phones and smartwatches; (vi) body area sensor networks for emotion and stress studies; (vii) the use of experimental datasets in emotion recognition, including dataset generation principles and concepts, quality insurance and emotion elicitation material and concepts; (viii) machine learning techniques for robust emotion recognition, including graphical models, neural network methods, deep learning methods, statistical learning and multivariate empirical mode decomposition; (ix) subject-independent emotion and stress recognition concepts and systems, including facial expression-based systems, speech-based systems, EEG-based systems, ECG-based systems, electrodermal activity-based systems, multimodal recognition systems and sensor fusion concepts and (x) emotion and stress estimation and forecasting from a nonlinear dynamical system perspective
    corecore