508 research outputs found
Robust signatures for 3D face registration and recognition
PhDBiometric authentication through face recognition has been an active area of
research for the last few decades, motivated by its application-driven demand. The popularity
of face recognition, compared to other biometric methods, is largely due to its
minimum requirement of subject co-operation, relative ease of data capture and similarity
to the natural way humans distinguish each other.
3D face recognition has recently received particular interest since three-dimensional
face scans eliminate or reduce important limitations of 2D face images, such as illumination
changes and pose variations. In fact, three-dimensional face scans are usually captured
by scanners through the use of a constant structured-light source, making them invariant
to environmental changes in illumination. Moreover, a single 3D scan also captures the
entire face structure and allows for accurate pose normalisation.
However, one of the biggest challenges that still remain in three-dimensional face
scans is the sensitivity to large local deformations due to, for example, facial expressions.
Due to the nature of the data, deformations bring about large changes in the 3D geometry
of the scan. In addition to this, 3D scans are also characterised by noise and artefacts such
as spikes and holes, which are uncommon with 2D images and requires a pre-processing
stage that is speci c to the scanner used to capture the data.
The aim of this thesis is to devise a face signature that is compact in size and
overcomes the above mentioned limitations. We investigate the use of facial regions and
landmarks towards a robust and compact face signature, and we study, implement and
validate a region-based and a landmark-based face signature. Combinations of regions and
landmarks are evaluated for their robustness to pose and expressions, while the matching
scheme is evaluated for its robustness to noise and data artefacts
Adaptive 3D facial action intensity estimation and emotion recognition
Automatic recognition of facial emotion has been widely studied for various computer vision tasks (e.g. health monitoring, driver state surveillance and personalized learning). Most existing facial emotion recognition systems, however, either have not fully considered subject-independent dynamic features or were limited to 2D models, thus are not robust enough for real-life recognition tasks with subject variation, head movement and illumination change. Moreover, there is also lack of systematic research on effective newly arrived novel emotion class detection. To address these challenges, we present a real-time 3D facial Action Unit (AU) intensity estimation and emotion recognition system. It automatically selects 16 motion-based facial feature sets using minimal-redundancy–maximal-relevance criterion based optimization and estimates the intensities of 16 diagnostic AUs using feedforward Neural Networks and Support Vector Regressors. We also propose a set of six novel adaptive ensemble classifiers for robust classification of the six basic emotions and the detection of newly arrived unseen novel emotion classes (emotions that are not included in the training set). A distance-based clustering and uncertainty measures of the base classifiers within each ensemble model are used to inform the novel class detection. Evaluated with the Bosphorus 3D database, the system has achieved the best performance of 0.071 overall Mean Squared Error (MSE) for AU intensity estimation using Support Vector Regressors, and 92.2% average accuracy for the recognition of the six basic emotions using the proposed ensemble classifiers. In comparison with other related work, our research outperforms other state-of-the-art research on 3D facial emotion recognition for the Bosphorus database. Moreover, in on-line real-time evaluation with real human subjects, the proposed system also shows superior real-time performance with 84% recognition accuracy and great flexibility and adaptation for newly arrived novel (e.g. ‘contempt’ which is not included in the six basic emotions) emotion detection
Affective feedback: an investigation into the role of emotions in the information seeking process
User feedback is considered to be a critical element in the information seeking process, especially in relation to relevance assessment. Current feedback techniques determine content relevance with respect to the cognitive and situational levels of interaction that occurs between the user and the retrieval system. However, apart from real-life problems and information objects, users interact with intentions, motivations and feelings, which can be seen as critical aspects of cognition and decision-making. The study presented in this paper serves as a starting point to the exploration of the role of emotions in the information seeking process. Results show that the latter not only interweave with different physiological, psychological and cognitive processes, but also form distinctive patterns, according to specific task, and according to specific user
DAD-3DHeads: A Large-scale Dense, Accurate and Diverse Dataset for 3D Head Alignment from a Single Image
We present DAD-3DHeads, a dense and diverse large-scale dataset, and a robust
model for 3D Dense Head Alignment in the wild. It contains annotations of over
3.5K landmarks that accurately represent 3D head shape compared to the
ground-truth scans. The data-driven model, DAD-3DNet, trained on our dataset,
learns shape, expression, and pose parameters, and performs 3D reconstruction
of a FLAME mesh. The model also incorporates a landmark prediction branch to
take advantage of rich supervision and co-training of multiple related tasks.
Experimentally, DAD-3DNet outperforms or is comparable to the state-of-the-art
models in (i) 3D Head Pose Estimation on AFLW2000-3D and BIWI, (ii) 3D Face
Shape Reconstruction on NoW and Feng, and (iii) 3D Dense Head Alignment and 3D
Landmarks Estimation on DAD-3DHeads dataset. Finally, the diversity of
DAD-3DHeads in camera angles, facial expressions, and occlusions enables a
benchmark to study in-the-wild generalization and robustness to distribution
shifts. The dataset webpage is https://p.farm/research/dad-3dheads
A Survey of Multimedia Technologies and Robust Algorithms
Multimedia technologies are now more practical and deployable in real life,
and the algorithms are widely used in various researching areas such as deep
learning, signal processing, haptics, computer vision, robotics, and medical
multimedia processing. This survey provides an overview of multimedia
technologies and robust algorithms in multimedia data processing, medical
multimedia processing, human facial expression tracking and pose recognition,
and multimedia in education and training. This survey will also analyze and
propose a future research direction based on the overview of current robust
algorithms and multimedia technologies. We want to thank the research and
previous work done by the Multimedia Research Centre (MRC), the University of
Alberta, which is the inspiration and starting point for future research.Comment: arXiv admin note: text overlap with arXiv:2010.1296
Enhancing Expressiveness of Speech through Animated Avatars for Instant Messaging and Mobile Phones
This thesis aims to create a chat program that allows users to communicate via an animated avatar that provides believable lip-synchronization and expressive emotion. Currently many avatars do not attempt to do lip-synchronization. Those that do are not well synchronized and have little or no emotional expression. Most avatars with lip synch use realistic looking 3D models or stylized rendering of complex models. This work utilizes images rendered in a cartoon style and lip-synchronization rules based on traditional animation. The cartoon style, as opposed to a more realistic look, makes the mouth motion more believable and the characters more appealing. The cartoon look and image-based animation (as opposed to a graphic model animated through manipulation of a skeleton or wireframe) also allows for fewer key frames resulting in faster speed with more room for expressiveness. When text is entered into the program, the Festival Text-to-Speech engine creates a speech file and extracts phoneme and phoneme duration data. Believable and fluid lip-synchronization is then achieved by means of a number of phoneme-to-image rules. Alternatively, phoneme and phoneme duration data can be obtained for speech dictated into a microphone using Microsoft SAPI and the CSLU Toolkit. Once lip synchronization has been completed, rules for non-verbal animation are added. Emotions are appended to the animation of speech in two ways: automatically, by recognition of key words and punctuation, or deliberately, by user-defined tags. Additionally, rules are defined for idle-time animation. Preliminary results indicate that the animated avatar program offers an improvement over currently available software. It aids in the understandability of speech, combines easily recognizable and expressive emotions with speech, and successfully enhances overall enjoyment of the chat experience. Applications for the program include use in cell phones for the deaf or hearing impaired, instant messaging, video conferencing, instructional software, and speech and animation synthesis
- …