28 research outputs found

    POSEidon: Face-from-Depth for Driver Pose Estimation

    Get PDF
    Fast and accurate upper-body and head pose estimation is a key task for automatic monitoring of driver attention, a challenging context characterized by severe illumination changes, occlusions and extreme poses. In this work, we present a new deep learning framework for head localization and pose estimation on depth images. The core of the proposal is a regression neural network, called POSEidon, which is composed of three independent convolutional nets followed by a fusion layer, specially conceived for understanding the pose by depth. In addition, to recover the intrinsic value of face appearance for understanding head position and orientation, we propose a new Face-from-Depth approach for learning image faces from depth. Results in face reconstruction are qualitatively impressive. We test the proposed framework on two public datasets, namely Biwi Kinect Head Pose and ICT-3DHP, and on Pandora, a new challenging dataset mainly inspired by the automotive setup. Results show that our method overcomes all recent state-of-art works, running in real time at more than 30 frames per second

    POSEidon: Face-from-Depth for Driver Pose Estimation

    Get PDF
    Fast and accurate upper-body and head pose estimation is a key task for automatic monitoring of driver attention, a challenging context characterized by severe illumination changes, occlusions and extreme poses. In this work, we present a new deep learning framework for head localization and pose estimation on depth images. The core of the proposal is a regression neural network, called POSEidon, which is composed of three independent convolutional nets followed by a fusion layer, specially conceived for understanding the pose by depth. In addition, to recover the intrinsic value of face appearance for understanding head position and orientation, we propose a new Face-from-Depth approach for learning image faces from depth. Results in face reconstruction are qualitatively impressive. We test the proposed framework on two public datasets, namely Biwi Kinect Head Pose and ICT-3DHP, and on Pandora, a new challenging dataset mainly inspired by the automotive setup. Results show that our method overcomes all recent state-of-art works, running in real time at more than 30 frames per second

    Towards a complete 3D morphable model of the human head

    Get PDF
    Three-dimensional Morphable Models (3DMMs) are powerful statistical tools for representing the 3D shapes and textures of an object class. Here we present the most complete 3DMM of the human head to date that includes face, cranium, ears, eyes, teeth and tongue. To achieve this, we propose two methods for combining existing 3DMMs of different overlapping head parts: i. use a regressor to complete missing parts of one model using the other, ii. use the Gaussian Process framework to blend covariance matrices from multiple models. Thus we build a new combined face-and-head shape model that blends the variability and facial detail of an existing face model (the LSFM) with the full head modelling capability of an existing head model (the LYHM). Then we construct and fuse a highly-detailed ear model to extend the variation of the ear shape. Eye and eye region models are incorporated into the head model, along with basic models of the teeth, tongue and inner mouth cavity. The new model achieves state-of-the-art performance. We use our model to reconstruct full head representations from single, unconstrained images allowing us to parameterize craniofacial shape and texture, along with the ear shape, eye gaze and eye color.Comment: 18 pages, 18 figures, submitted to Transactions on Pattern Analysis and Machine Intelligence (TPAMI) on the 9th of October as an extension paper of the original oral CVPR paper : arXiv:1903.0378

    Multimodal headpose estimation and applications

    Get PDF
    This thesis presents new research into human headpose estimation and its applications in multi-modal data. We develop new methods for head pose estimation spanning RGB-D Human Computer Interaction (HCI) to far away "in the wild" surveillance quality data. We present the state-of-the-art solution in both head detection and head pose estimation through a new end-to-end Convolutional Neural Network architecture that reuses all of the computation for detection and pose estimation. In contrast to prior work, our method successfully spans close up HCI to low-resolution surveillance data and is cross modality: operating on both RGB and RGB-D data. We further address the problem of limited amount of standard data, and different quality of annotations by semi supervised learning and novel data augmentation. (This latter contribution also finds application in the domain of life sciences.) We report the highest accuracy by a large margin: 60% improvement; and demonstrate leading performance on multiple standardized datasets. In HCI we reduce the angular error by 40% relative to the previous reported literature. Furthermore, by defining a probabilistic spatial gaze model from the head pose we show application in human-human, human-scene interaction understanding. We present the state-of-the art results on the standard interaction datasets. A new metric to model "social mimicry" through the temporal correlation of the headpose signal is contributed and shown to be valid qualitatively and intuitively. As an application in surveillance, it is shown that with the robust headpose signal as a prior, state-of-the-art results in tracking under occlusion using a Kalman filter can be achieved. This model is named the Intentional Tracker and it improves visual tracking metrics by up to 15%. We also apply the ALICE loss that was developed for the end-to-end detection and classification, to dense classiffication of underwater coral reefs imagery. The objective of this work is to solve the challenging task of recognizing and segmenting underwater coral imagery in the wild with sparse point-based ground truth labelling. To achieve this, we propose an integrated Fully Convolutional Neural Network (FCNN) and Fully-Connected Conditional Random Field (CRF) based classification and segmentation algorithm. Our major contributions lie in four major areas. First, we show that multi-scale crop based training is useful in learning of the initial weights in the canonical one class classiffication problem. Second, we propose a modified ALICE loss for training the FCNN on sparse labels with class imbalance and establish its signi cance empirically. Third we show that by arti cially enhancing the point labels to small regions based on class distance transform, we can improve the classification accuracy further. Fourth, we improve the segmentation results using fully connected CRFs by using a bilateral message passing prior. We improve upon state-of-the-art results on all publicly available datasets by a significant margin

    Cross-Domain Multitask Model for Head Detection and Facial Attribute Estimation

    Get PDF
    Extracting specific attributes of a face within an image, such as emotion, age, or head pose has numerous applications. As one of the most widely used vision-based attribute extraction models, HPE (Head Pose Estimation) models have been extensively explored. In spite of the success of these models, the pre-processing step of cropping the region of interest from the image, before it is fed into the network, is still a challenge. Moreover, a significant portion of the existing models are problem-specific models developed specifically for HPE. In response to the wide application of HPE models and the limitations of existing techniques, we developed a multi-purpose, multi-task model to parallelize face detection and pose estimation (i.e., along both axes of yaw and pitch). This model is based on the Mask-RCNN object detection model, which computes a collection of mid-level shared features in conjunction with some independent neural networks, for the detection of faces and the estimation of poses. We evaluated the proposed model using two publicly available datasets, Prima and BIWI, and obtained MAEs (Mean Absolute Errors) of 8.0 ± 8.6, and 8.2 ± 8.1 for yaw and pitch detection on Prima, and 6.2 ± 4.7, and 6.6 ± 4.9 on BIWI dataset. The generalization capability of the model and its cross-domain effectiveness was assessed on the publicly available dataset of UTKFace for face detection and age estimation, resulting a MAE of 5.3 ± 3.2. A comparison of the proposed model’s performance on the domains it was tested on reveals that it compares favorably with the state-of-the-art models, as demonstrated by their published results. We provide the source code of our model for public use at: https://github.com/kahroba2000/MTL_MRCNN

    Vision-based Driver State Monitoring Using Deep Learning

    Get PDF
    Road accidents cause thousands of injuries and losses of lives every year, ranking among the top lifetime odds of death causes. More than 90% of the traffic accidents are caused by human errors [1], including sight obstruction, failure to spot danger through inattention, speeding, expectation errors, and other reasons. In recent years, driver monitoring systems (DMS) have been rapidly studied and developed to be used in commercial vehicles to prevent human error-caused car crashes. A DMS is a vehicle safety system that monitors driver’s attention and warns if necessary. Such a system may contain multiple modules that detect the most accident-related human factors, such as drowsiness and distractions. Typical DMS approaches seek driver distraction cues either from vehicle acceleration and steering (vehicle-based approach), driver physiological signals (physiological approach), or driver behaviours (behavioural-based approach). Behavioural-based driver state monitoring has numerous advantages over vehicle-based and physiological-based counterparts, including fast responsiveness and non-intrusiveness. In addition, the recent breakthrough in deep learning enables high-level action and face recognition, expanding driver monitoring coverage and improving model performance. This thesis presents CareDMS, a behavioural approach-based driver monitoring system using deep learning methods. CareDMS consists of driver anomaly detection and classification, gaze estimation, and emotion recognition. Each approach is developed with state-of-the-art deep learning solutions to address the shortcomings of the current DMS functionalities. Combined with a classic drowsiness detection method, CareDMS thoroughly covers three major types of distractions: physical (hands-off-steering wheel), visual (eyes-off-road ahead), and cognitive (minds-off-driving). There are numerous challenges in behavioural-based driver state monitoring. Current driver distraction detection methods either lack detailed distraction classification or unknown driver anomalies generalization. This thesis introduces a novel two-phase proposal and classification network architecture. It can suspect all forms of distracted driving and recognize driver actions simultaneously, which provide downstream DMS important information for warning level customization. Next, gaze estimation for driver monitoring is difficult as drivers tend to have severe head movements while driving. This thesis proposes a video-based neural network that jointly learns head pose and gaze dynamics together. The design significantly reduces per-head-pose gaze estimation performance variance compared to benchmarks. Furthermore, emotional driving such as road rage and sadness could seriously impact driving performance. However, individuals have various emotional expressions, which makes vision-based emotion recognition a challenging task. This work proposes an efficient and versatile multimodal fusion module that effectively fuses facial expression and human voice for emotion recognition. Visible advantages are demonstrated compared to using a single modality. Finally, a driver state monitoring system, CareDMS, is presented to convert the output of each functionality into a specific driver’s status measurement and integrates various measurements into the driver’s level of alertness

    Predictive Model of Driver\u27s Eye Fixation for Maneuver Prediction in the Design of Advanced Driving Assistance Systems

    Get PDF
    Over the last few years, Advanced Driver Assistance Systems (ADAS) have been shown to significantly reduce the number of vehicle accidents. Accord- ing to the National Highway Traffic Safety Administration (NHTSA), driver errors contribute to 94% of road collisions. This research aims to develop a predictive model of driver eye fixation by analyzing the driver eye and head information (cephalo-ocular) for maneuver prediction in an Advanced Driving Assistance System (ADAS). Several ADASs have been developed to help drivers to perform driving tasks in complex environments and many studies were conducted on improving automated systems. Some research has relied on the fact that the driver plays a crucial role in most driving scenarios, recognizing the driver’s role as the central element in ADASs. The way in which a driver monitors the surrounding environment is at least partially descriptive of the driver’s situation awareness. This thesis’s primary goal is the quantitative and qualitative analysis of driver behavior to determine the relationship between driver intent and actions. The RoadLab initiative provided an instrumented vehicle equipped with an on-board diagnostic system, an eye-gaze tracker, and a stereo vision system for the extraction of relevant features from the driver, the vehicle, and the environment. Several driver behavioral features are investigated to determine whether there is a relevant relation between the driver’s eye fixations and the prediction of driving maneuvers

    FUZZY KERNEL REGRESSION FOR REGISTRATION AND OTHER IMAGE WARPING APPLICATIONS

    Get PDF
    In this dissertation a new approach for non-rigid medical im- age registration is presented. It relies onto a probabilistic framework based on the novel concept of Fuzzy Kernel Regression. The theoric framework, after a formal introduction is applied to develop several complete registration systems, two of them are interactive and one is fully automatic. They all use the composition of local deforma- tions to achieve the final alignment. Automatic one is based onto the maximization of mutual information to produce local affine aligments which are merged into the global transformation. Mutual Information maximization procedure uses gradient descent method. Due to the huge amount of data associated to medical images, a multi-resolution topology is embodied, reducing processing time. The distance based interpolation scheme injected facilitates the similairity measure op- timization by attenuating the presence of local maxima in the func- tional. System blocks are implemented on GPGPUs allowing efficient parallel computation of large 3d datasets using SIMT execution. Due to the flexibility of Mutual Information, it can be applied to multi- modality image scans (MRI, CT, PET, etc.). Both quantitative and qualitative experiments show promising results and great potential for future extension. Finally the framework flexibility is shown by means of its succesful application to the image retargeting issue, methods and results are presented
    corecore