28 research outputs found
POSEidon: Face-from-Depth for Driver Pose Estimation
Fast and accurate upper-body and head pose estimation is a key task for automatic monitoring of driver attention, a challenging context characterized by severe illumination changes, occlusions and extreme poses. In this work, we present a new deep learning framework for head localization and pose estimation on depth images. The core of the proposal is a regression neural network, called POSEidon, which is composed of three independent convolutional nets followed by a fusion layer, specially conceived for understanding the pose by depth. In addition, to recover the intrinsic value of face appearance for understanding head position and orientation, we propose a new Face-from-Depth approach for learning image faces from depth. Results in face reconstruction are qualitatively impressive. We test the proposed framework on two public datasets, namely Biwi Kinect Head Pose and ICT-3DHP, and on Pandora, a new challenging dataset mainly inspired by the automotive setup. Results show that our method overcomes all recent state-of-art works, running in real time at more than 30 frames per second
POSEidon: Face-from-Depth for Driver Pose Estimation
Fast and accurate upper-body and head pose estimation is a key task for automatic monitoring of driver attention, a challenging context characterized by severe illumination changes, occlusions and extreme poses. In this work, we present a new deep learning framework for head localization and pose estimation on depth images. The core of the proposal is a regression neural network, called POSEidon, which is composed of three independent convolutional nets followed by a fusion layer, specially conceived for understanding the pose by depth. In addition, to recover the intrinsic value of face appearance for understanding head position and orientation, we propose a new Face-from-Depth approach for learning image faces from depth. Results in face reconstruction are qualitatively impressive. We test the proposed framework on two public datasets, namely Biwi Kinect Head Pose and ICT-3DHP, and on Pandora, a new challenging dataset mainly inspired by the automotive setup. Results show that our method overcomes all recent state-of-art works, running in real time at more than 30 frames per second
Towards a complete 3D morphable model of the human head
Three-dimensional Morphable Models (3DMMs) are powerful statistical tools for
representing the 3D shapes and textures of an object class. Here we present the
most complete 3DMM of the human head to date that includes face, cranium, ears,
eyes, teeth and tongue. To achieve this, we propose two methods for combining
existing 3DMMs of different overlapping head parts: i. use a regressor to
complete missing parts of one model using the other, ii. use the Gaussian
Process framework to blend covariance matrices from multiple models. Thus we
build a new combined face-and-head shape model that blends the variability and
facial detail of an existing face model (the LSFM) with the full head modelling
capability of an existing head model (the LYHM). Then we construct and fuse a
highly-detailed ear model to extend the variation of the ear shape. Eye and eye
region models are incorporated into the head model, along with basic models of
the teeth, tongue and inner mouth cavity. The new model achieves
state-of-the-art performance. We use our model to reconstruct full head
representations from single, unconstrained images allowing us to parameterize
craniofacial shape and texture, along with the ear shape, eye gaze and eye
color.Comment: 18 pages, 18 figures, submitted to Transactions on Pattern Analysis
and Machine Intelligence (TPAMI) on the 9th of October as an extension paper
of the original oral CVPR paper : arXiv:1903.0378
Multimodal headpose estimation and applications
This thesis presents new research into human headpose estimation and its applications
in multi-modal data. We develop new methods for head pose estimation
spanning RGB-D Human Computer Interaction (HCI) to far away "in the wild"
surveillance quality data. We present the state-of-the-art solution in both head
detection and head pose estimation through a new end-to-end Convolutional Neural
Network architecture that reuses all of the computation for detection and pose
estimation. In contrast to prior work, our method successfully spans close up HCI
to low-resolution surveillance data and is cross modality: operating on both RGB
and RGB-D data. We further address the problem of limited amount of standard
data, and different quality of annotations by semi supervised learning and novel
data augmentation. (This latter contribution also finds application in the domain
of life sciences.)
We report the highest accuracy by a large margin: 60% improvement; and demonstrate
leading performance on multiple standardized datasets. In HCI we reduce
the angular error by 40% relative to the previous reported literature. Furthermore,
by defining a probabilistic spatial gaze model from the head pose we show
application in human-human, human-scene interaction understanding. We present
the state-of-the art results on the standard interaction datasets. A new metric to
model "social mimicry" through the temporal correlation of the headpose signal
is contributed and shown to be valid qualitatively and intuitively. As an application
in surveillance, it is shown that with the robust headpose signal as a prior,
state-of-the-art results in tracking under occlusion using a Kalman filter can be
achieved. This model is named the Intentional Tracker and it improves visual
tracking metrics by up to 15%.
We also apply the ALICE loss that was developed for the end-to-end detection
and classification, to dense classiffication of underwater coral reefs imagery. The
objective of this work is to solve the challenging task of recognizing and segmenting
underwater coral imagery in the wild with sparse point-based ground truth
labelling. To achieve this, we propose an integrated Fully Convolutional Neural
Network (FCNN) and Fully-Connected Conditional Random Field (CRF) based classification and segmentation algorithm. Our major contributions lie in four major
areas. First, we show that multi-scale crop based training is useful in learning
of the initial weights in the canonical one class classiffication problem. Second,
we propose a modified ALICE loss for training the FCNN on sparse labels with
class imbalance and establish its signi cance empirically. Third we show that
by arti cially enhancing the point labels to small regions based on class distance
transform, we can improve the classification accuracy further. Fourth, we improve
the segmentation results using fully connected CRFs by using a bilateral message
passing prior. We improve upon state-of-the-art results on all publicly available
datasets by a significant margin
Cross-Domain Multitask Model for Head Detection and Facial Attribute Estimation
Extracting specific attributes of a face within an image, such as emotion, age, or head pose has numerous applications. As one of the most widely used vision-based attribute extraction models, HPE (Head Pose Estimation) models have been extensively explored. In spite of the success of these models, the pre-processing step of cropping the region of interest from the image, before it is fed into the network, is still a challenge. Moreover, a significant portion of the existing models are problem-specific models developed specifically for HPE. In response to the wide application of HPE models and the limitations of existing techniques, we developed a multi-purpose, multi-task model to parallelize face detection and pose estimation (i.e., along both axes of yaw and pitch). This model is based on the Mask-RCNN object detection model, which computes a collection of mid-level shared features in conjunction with some independent neural networks, for the detection of faces and the estimation of poses. We evaluated the proposed model using two publicly available datasets, Prima and BIWI, and obtained MAEs (Mean Absolute Errors) of 8.0 ± 8.6, and 8.2 ± 8.1 for yaw and pitch detection on Prima, and 6.2 ± 4.7, and 6.6 ± 4.9 on BIWI dataset. The generalization capability of the model and its cross-domain effectiveness was assessed on the publicly available dataset of UTKFace for face detection and age estimation, resulting a MAE of 5.3 ± 3.2. A comparison of the proposed model’s performance on the domains it was tested on reveals that it compares favorably with the state-of-the-art models, as demonstrated by their published results. We provide the source code of our model for public use at: https://github.com/kahroba2000/MTL_MRCNN
Vision-based Driver State Monitoring Using Deep Learning
Road accidents cause thousands of injuries and losses of lives every year, ranking among the top lifetime odds of death causes. More than 90% of the traffic accidents are caused by human errors [1], including sight obstruction, failure to spot danger through inattention, speeding, expectation errors, and other reasons. In recent years, driver monitoring systems (DMS) have been rapidly studied and developed to be used in commercial vehicles to prevent human error-caused car crashes. A DMS is a vehicle safety system that monitors driver’s attention and warns if necessary. Such a system may contain multiple modules that detect the most accident-related human factors, such as drowsiness and distractions. Typical DMS approaches seek driver distraction cues either from vehicle acceleration and steering (vehicle-based approach), driver physiological signals (physiological approach), or driver behaviours (behavioural-based approach). Behavioural-based driver state monitoring has numerous advantages over vehicle-based and physiological-based counterparts, including fast responsiveness and non-intrusiveness. In addition, the recent breakthrough in deep learning enables high-level action and face recognition, expanding driver monitoring coverage and improving model performance. This thesis presents CareDMS, a behavioural approach-based driver monitoring system using deep learning methods. CareDMS consists of driver anomaly detection and classification, gaze estimation, and emotion recognition. Each approach is developed with state-of-the-art deep learning solutions to address the shortcomings of the current DMS functionalities. Combined with a classic drowsiness detection method, CareDMS thoroughly covers three major types of distractions: physical (hands-off-steering wheel), visual (eyes-off-road ahead), and cognitive (minds-off-driving).
There are numerous challenges in behavioural-based driver state monitoring. Current driver distraction detection methods either lack detailed distraction classification or unknown driver anomalies generalization. This thesis introduces a novel two-phase proposal and classification network architecture. It can suspect all forms of distracted driving and recognize driver actions simultaneously, which provide downstream DMS important information for warning level customization. Next, gaze estimation for driver monitoring is difficult as drivers tend to have severe head movements while driving. This thesis proposes a video-based neural network that jointly learns head pose and gaze dynamics together. The design significantly reduces per-head-pose gaze estimation performance variance compared to benchmarks. Furthermore, emotional driving such as road rage and sadness could seriously impact driving performance. However, individuals have various emotional expressions, which makes vision-based emotion recognition a challenging task. This work proposes an efficient and versatile multimodal fusion module that effectively fuses facial expression and human voice for emotion recognition. Visible advantages are demonstrated compared to using a single modality. Finally, a driver state monitoring system, CareDMS, is presented to convert the output of each functionality into a specific driver’s status measurement and integrates various measurements into the driver’s level of alertness
Predictive Model of Driver\u27s Eye Fixation for Maneuver Prediction in the Design of Advanced Driving Assistance Systems
Over the last few years, Advanced Driver Assistance Systems (ADAS) have been shown to significantly reduce the number of vehicle accidents. Accord- ing to the National Highway Traffic Safety Administration (NHTSA), driver errors contribute to 94% of road collisions. This research aims to develop a predictive model of driver eye fixation by analyzing the driver eye and head information (cephalo-ocular) for maneuver prediction in an Advanced Driving Assistance System (ADAS). Several ADASs have been developed to help drivers to perform driving tasks in complex environments and many studies were conducted on improving automated systems. Some research has relied on the fact that the driver plays a crucial role in most driving scenarios, recognizing the driver’s role as the central element in ADASs. The way in which a driver monitors the surrounding environment is at least partially descriptive of the driver’s situation awareness. This thesis’s primary goal is the quantitative and qualitative analysis of driver behavior to determine the relationship between driver intent and actions. The RoadLab initiative provided an instrumented vehicle equipped with an on-board diagnostic system, an eye-gaze tracker, and a stereo vision system for the extraction of relevant features from the driver, the vehicle, and the environment. Several driver behavioral features are investigated to determine whether there is a relevant relation between the driver’s eye fixations and the prediction of driving maneuvers
FUZZY KERNEL REGRESSION FOR REGISTRATION AND OTHER IMAGE WARPING APPLICATIONS
In this dissertation a new approach for non-rigid medical im-
age registration is presented. It relies onto a probabilistic framework
based on the novel concept of Fuzzy Kernel Regression. The theoric
framework, after a formal introduction is applied to develop several
complete registration systems, two of them are interactive and one
is fully automatic. They all use the composition of local deforma-
tions to achieve the final alignment. Automatic one is based onto the
maximization of mutual information to produce local affine aligments
which are merged into the global transformation. Mutual Information
maximization procedure uses gradient descent method. Due to the
huge amount of data associated to medical images, a multi-resolution
topology is embodied, reducing processing time. The distance based
interpolation scheme injected facilitates the similairity measure op-
timization by attenuating the presence of local maxima in the func-
tional. System blocks are implemented on GPGPUs allowing efficient
parallel computation of large 3d datasets using SIMT execution. Due
to the flexibility of Mutual Information, it can be applied to multi-
modality image scans (MRI, CT, PET, etc.).
Both quantitative and qualitative experiments show promising results
and great potential for future extension.
Finally the framework flexibility is shown by means of its succesful
application to the image retargeting issue, methods and results are
presented