Search CORE

13 research outputs found

3d Face Reconstruction And Emotion Analytics With Part-Based Morphable Models

Author: Jin Hai
Publication venue: DigitalCommons@WayneState
Publication date: 01/01/2018
Field of study

3D face reconstruction and facial expression analytics using 3D facial data are new and hot research topics in computer graphics and computer vision. In this proposal, we first review the background knowledge for emotion analytics using 3D morphable face model, including geometry feature-based methods, statistic model-based methods and more advanced deep learning-bade methods. Then, we introduce a novel 3D face modeling and reconstruction solution that robustly and accurately acquires 3D face models from a couple of images captured by a single smartphone camera. Two selfie photos of a subject taken from the front and side are used to guide our Non-Negative Matrix Factorization (NMF) induced part-based face model to iteratively reconstruct an initial 3D face of the subject. Then, an iterative detail updating method is applied to the initial generated 3D face to reconstruct facial details through optimizing lighting parameters and local depths. Our iterative 3D face reconstruction method permits fully automatic registration of a part-based face representation to the acquired face data and the detailed 2D/3D features to build a high-quality 3D face model. The NMF part-based face representation learned from a 3D face database facilitates effective global and adaptive local detail data fitting alternatively. Our system is flexible and it allows users to conduct the capture in any uncontrolled environment. We demonstrate the capability of our method by allowing users to capture and reconstruct their 3D faces by themselves. Based on the 3D face model reconstruction, we can analyze the facial expression and the related emotion in 3D space. We present a novel approach to analyze the facial expressions from images and a quantitative information visualization scheme for exploring this type of visual data. From the reconstructed result using NMF part-based morphable 3D face model, basis parameters and a displacement map are extracted as features for facial emotion analysis and visualization. Based upon the features, two Support Vector Regressions (SVRs) are trained to determine the fuzzy Valence-Arousal (VA) values to quantify the emotions. The continuously changing emotion status can be intuitively analyzed by visualizing the VA values in VA-space. Our emotion analysis and visualization system, based on 3D NMF morphable face model, detects expressions robustly from various head poses, face sizes and lighting conditions, and is fully automatic to compute the VA values from images or a sequence of video with various facial expressions. To evaluate our novel method, we test our system on publicly available databases and evaluate the emotion analysis and visualization results. We also apply our method to quantifying emotion changes during motivational interviews. These experiments and applications demonstrate effectiveness and accuracy of our method. In order to improve the expression recognition accuracy, we present a facial expression recognition approach with 3D Mesh Convolutional Neural Network (3DMCNN) and a visual analytics guided 3DMCNN design and optimization scheme. The geometric properties of the surface is computed using the 3D face model of a subject with facial expressions. Instead of using regular Convolutional Neural Network (CNN) to learn intensities of the facial images, we convolve the geometric properties on the surface of the 3D model using 3DMCNN. We design a geodesic distance-based convolution method to overcome the difficulties raised from the irregular sampling of the face surface mesh. We further present an interactive visual analytics for the purpose of designing and modifying the networks to analyze the learned features and cluster similar nodes in 3DMCNN. By removing low activity nodes in the network, the performance of the network is greatly improved. We compare our method with the regular CNN-based method by interactively visualizing each layer of the networks and analyze the effectiveness of our method by studying representative cases. Testing on public datasets, our method achieves a higher recognition accuracy than traditional image-based CNN and other 3D CNNs. The presented framework, including 3DMCNN and interactive visual analytics of the CNN, can be extended to other applications

Digital Commons@Wayne State University

Deep Learning for Head Pose Estimation: A Survey

Author: Asperti Andrea
Filippini Daniele
Publication venue
Publication date: 01/01/2023
Field of study

Head pose estimation (HPE) is an active and popular area of research. Over the years, many approaches have constantly been developed, leading to a progressive improvement in accuracy; nevertheless, head pose estimation remains an open research topic, especially in unconstrained environments. In this paper, we will review the increasing amount of available datasets and the modern methodologies used to estimate orientation, with a special attention to deep learning techniques. We will discuss the evolution of the feld by proposing a classifcation of head pose estimation methods, explaining their advantages and disadvantages, and highlighting the diferent ways deep learning techniques have been used in the context of HPE. An in-depth performance comparison and discussion is presented at the end of the work. We also highlight the most promising research directions for future investigations on the topic

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Face pose estimation in monocular images

Author: Muhammad Shafi (7168253)
Publication venue
Publication date: 01/01/2010
Field of study

People use orientation of their faces to convey rich, inter-personal information. For example, a person will direct his face to indicate who the intended target of the conversation is. Similarly in a conversation, face orientation is a non-verbal cue to listener when to switch role and start speaking, and a nod indicates that a person has understands, or agrees with, what is being said. Further more, face pose estimation plays an important role in human-computer interaction, virtual reality applications, human behaviour analysis, pose-independent face recognition, driver s vigilance assessment, gaze estimation, etc. Robust face recognition has been a focus of research in computer vision community for more than two decades. Although substantial research has been done and numerous methods have been proposed for face recognition, there remain challenges in this field. One of these is face recognition under varying poses and that is why face pose estimation is still an important research area. In computer vision, face pose estimation is the process of inferring the face orientation from digital imagery. It requires a serious of image processing steps to transform a pixel-based representation of a human face into a high-level concept of direction. An ideal face pose estimator should be invariant to a variety of image-changing factors such as camera distortion, lighting condition, skin colour, projective geometry, facial hairs, facial expressions, presence of accessories like glasses and hats, etc. Face pose estimation has been a focus of research for about two decades and numerous research contributions have been presented in this field. Face pose estimation techniques in literature have still some shortcomings and limitations in terms of accuracy, applicability to monocular images, being autonomous, identity and lighting variations, image resolution variations, range of face motion, computational expense, presence of facial hairs, presence of accessories like glasses and hats, etc. These shortcomings of existing face pose estimation techniques motivated the research work presented in this thesis. The main focus of this research is to design and develop novel face pose estimation algorithms that improve automatic face pose estimation in terms of processing time, computational expense, and invariance to different conditions

Loughborough University Institutional Repository

A dynamic key frames approach to object tracking

Author: Wilkens Christopher A
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2008
Field of study

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 95-96).In this thesis, I present a dynamic key frames algorithm for state estimation from observations. The algorithm uses KL-divergence as a metric to identify the frames that contribute the most information to estimation of the system's current state. The algorithm is first presented in a numerical optimization framework and then developed as an extension to the Condensation algorithm. Finally, I present results from a Matlab simulation of the algorithm.by Christopher A. Wilkens.M.Eng

DSpace@MIT

View-based 3D Objects Recognition with Expectation Propagation Learning

Author: Bertrand Adrien
Publication venue
Publication date: 01/03/2016
Field of study

In this thesis, we present an improvement on the Expectation Propagation learning framework, specifically various enhancements on both speed and accuracy. We use this enhanced EP learning with the Inverted Dirichlet mixture model as well as the Dirichlet mixture model, to implement an algorithm to recognize 3D objects. Those objects are in our case from a view-based 3D models database that we have assembled. Following specific rules determined by analyzing the results of our tests, we’ve been able to get good recognition rates. Experimental results are presented with different object classes by comparing recognition rates and confidence level, according to different tuning parameters we’re able to refine towards specific classes for better specialized accuracy

Concordia University Research Repository

An implementation of face-to-face grounding in an embodied conversational agent

Author: Reinstein Gabriel A. (Gabriel Alexander), 1980-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2003
Field of study

Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2003.Includes bibliographical references (leaves 53-55).When people have a face-to-face conversation, they don't just spout information blindly-they work to make sure that both participants understand what has been said. This process of ensuring that what has been said is added to the common ground of the conversation is called grounding. This thesis explores recent research into the verbal and nonverbal means for grounding, and presents an implementation of a face-to-face grounding system in an embodied conversational agent that is based on a model of grounding extracted from the research. This is the first such agent that supports nonverbal grounding, and so this thesis represents both a proof of concept and a guide for future work in this area, showing that it is possible to build a dialogue system that implements face-to-face grounding between a human and an agent based on an empirically-derived model. Additionally, this thesis describes a vision system, based on a stereo-camera head-pose tracker and using a recently proposed method for head-nod detection, that can robustly and accurately identify head nods and gaze state.by Gabriel A. Reinstein.M.Eng

DSpace@MIT

Development of a performance-based approach for collision avoidance and mitigation

Author: Yang Ji Hyun, 1978-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2003
Field of study

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2003.Includes bibliographical references (leaves 70-72).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Many threat assessment algorithms are based on a collection of threshold equations that predict when a collision is to occur. The fact that there are numerous algorithms suggests a need to understand the underlying principles behind the equation design and threshold settings. This thesis presents a methodology to develop appropriate alerting thresholds based on performance metrics. This also allows us to compare different alerting algorithms and evaluate alerting systems. The method is a performance-based approach in state-space. It can be used as a stand alone system for real-time implementation or a threshold design tool in conjunction with any chosen alerting algorithm or sensor system. Using carefully prescribed trajectory models (which may include uncertainties), the performance tradeoff with and without an alert can be predicted for different states along the course of an encounter situation. This information can then be used to set appropriate threshold values for the desired alerting logic. The development of the threshold criteria for a rear-end collision warning system is given as an example. Though the approach given is presented as a threshold design tool, the methodology is self-contained as a threat assessment logic. The possibility exists to compute the performance measures on-the-fly from which alerting decisions can be made directly. We demonstrate the methodology on Lincoln LS concept vehicle with a GPS-based system and a full-cab driving simulator as prototypes. Application examples, a collision mitigation by braking system and a face tracking warning system, are shown to handle the universality of the performance-based approach. For illustrative purposes, a vision-based system (post-processed off-line) is compared with the GPSbased system.by Ji Hyun Yang.S.M

DSpace@MIT

Communication error detection using facial expressions

Author: Wang Sy Bor, 1976-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2008
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 129-135).Automatic detection of communication errors in conversational systems typically rely only on acoustic cues. However, perceptual studies have indicated that speakers do exhibit visual communication error cues passively during the system's conversational turn. In this thesis, we introduce novel algorithms for face and body gesture recognition and present the first automatic system for detecting communication errors using facial expressions during the system's turn. This is useful as it detects communication problems before the user speaks a reply. To detect communication problems accurately and efficiently we develop novel extensions to hidden-state discriminative methods. We also present results that show when human subjects become aware that the conversational system is capable of receiving visual input, they become more communicative visually yet naturally.by Sy Bor Wang.Ph.D

DSpace@MIT