223 research outputs found

    Objects extraction and recognition for camera-based interaction : heuristic and statistical approaches

    Get PDF
    In this thesis, heuristic and probabilistic methods are applied to a number of problems for camera-based interactions. The goal is to provide solutions for a vision based system that is able to extract and analyze interested objects in camera images and to use that information for various interactions for mobile usage. New methods and new attempts of combination of existing methods are developed for different applications, including text extraction from complex scene images, bar code reading performed by camera phones, and face/facial feature detection and facial expression manipulation. The application-driven problems of camera-based interaction can not be modeled by a uniform and straightforward model that has very strong simplifications of reality. The solutions we learned to be efficient were to apply heuristic but easy of implementation approaches at first to reduce the complexity of the problems and search for possible means, then use developed statistical learning approaches to deal with the remaining difficult but well-defined problems and get much better accuracy. The process can be evolved in some or all of the stages, and the combination of the approaches is problem-dependent. Contribution of this thesis resides in two aspects: firstly, new features and approaches are proposed either as heuristics or statistical means for concrete applications; secondly engineering design combining seveal methods for system optimization is studied. Geometrical characteristics and the alignment of text, texture features of bar codes, and structures of faces can all be extracted as heuristics for object extraction and further recognition. The boosting algorithm is one of the proper choices to perform probabilistic learning and to achieve desired accuracy. New feature selection techniques are proposed for constructing the weak learner and applying the boosting output in concrete applications. Subspace methods such as manifold learning algorithms are introduced and tailored for facial expression analysis and synthesis. A modified generalized learning vector quantization method is proposed to deal with the blurring of bar code images. Efficient implementations that combine the approaches in a rational joint point are presented and the results are illustrated.reviewe

    Image Data Augmentation from Small Training Datasets Using Generative Adversarial Networks (GANs)

    Get PDF
    The scarcity of labelled data is a serious problem since deep models generally require a large amount of training data to achieve desired performance. Data augmentation is widely adopted to enhance the diversity of original datasets and further improve the performance of deep learning models. Learning-based methods, compared to traditional techniques, are specialized in feature extraction, which enhances the effectiveness of data augmentation. Generative adversarial networks (GANs), one of the learning-based generative models, have made remarkable advances in data synthesis. However, GANs still face many challenges in generating high-quality augmented images from small datasets because learning-based generative methods are difficult to create reliable outcomes without sufficient training data. This difficulty deteriorates the data augmentation applications using learning-based methods. In this thesis, to tackle the problem of labelled data scarcity and the training difficulty of augmenting image data from small datasets, three novel GAN models suitable for training with a small number of training samples have been proposed based on three different mapping relationships between the input and output images, including one-to-many mapping, one-to-one mapping, and many-to-many mapping. The proposed GANs employ limited training data, such as a small number of images and limited conditional features, and the synthetic images generated by the proposed GANs are expected to generate images of not only high generative quality but also desirable data diversity. To evaluate the effectiveness of the augmented images generated by the proposed models, inception distances and human perception methods are adopted. Additionally, different image classification tasks were carried out and accuracies from using the original datasets and the augmented datasets were compared. Experimental results illustrate the image classification performance based on convolutional neural networks, i.e., AlexNet, GoogLeNet, ResNet and VGGNet, is comprehensively enhanced, and the scale of improvement is significant when a small number of training samples are involved

    Improving Facial Analysis and Performance Driven Animation through Disentangling Identity and Expression

    Full text link
    We present techniques for improving performance driven facial animation, emotion recognition, and facial key-point or landmark prediction using learned identity invariant representations. Established approaches to these problems can work well if sufficient examples and labels for a particular identity are available and factors of variation are highly controlled. However, labeled examples of facial expressions, emotions and key-points for new individuals are difficult and costly to obtain. In this paper we improve the ability of techniques to generalize to new and unseen individuals by explicitly modeling previously seen variations related to identity and expression. We use a weakly-supervised approach in which identity labels are used to learn the different factors of variation linked to identity separately from factors related to expression. We show how probabilistic modeling of these sources of variation allows one to learn identity-invariant representations for expressions which can then be used to identity-normalize various procedures for facial expression analysis and animation control. We also show how to extend the widely used techniques of active appearance models and constrained local models through replacing the underlying point distribution models which are typically constructed using principal component analysis with identity-expression factorized representations. We present a wide variety of experiments in which we consistently improve performance on emotion recognition, markerless performance-driven facial animation and facial key-point tracking.Comment: to appear in Image and Vision Computing Journal (IMAVIS

    The integration of facial features over space and time

    Get PDF
    Faces are unique social stimuli that can be recognized in an instant. We can pick up information about gender, ethnicity, feelings, attentional focus or even attributes like attractiveness or trustworthiness remarkably quickly. How we achieve this has been subject to psychology, cognitive science and neuroscience since decades but we still don ́t know the full picture. The key theme of this thesis concerns the integration, or binding, of facial features over space and over time. We investigated both behavioral measures in healthy people and a group of people with Autism Spectrum Disorder (ASD), and the neuronal mechanisms in core face processing regions of the human brain. The first part of this thesis investigates the contribution of face responsive brain areas to whole face and part-based neural representation of facial expressions. This aspect has hardly been considered in the past, as most studies focused on the representation of identity instead. During a fmri- experiment, we presented whole faces and facial parts of happy and fearful expressions. We extracted the similarity of activity patterns in core network of face processing - occipital face area (OFA), fusiform face area (FFA) and superior temporal sulcus (STS) - across and within emotions between whole faces and facial parts. Previous studies based on identitity recognition have found holistic and part-based representations in the FFA while the OFA seems to mainly represent part-based information. The STS has hardly been considered in those studies, as it is thought to be preferentially involved to expression coding. We find both part-based representation of facial expressions and an emotion-indepented preference of whole faces in the FFA, in line with the previous findings for identity recognition. For STS, we detect emotion- dependent representations of faces and facial parts, supporting its major role in expression processing. The OFA, in contrast, shows similar representation of the eyes- and mouth-regions of both expressions without any further specific effects, adding evidence to its role as an entry-point of facial information into the core network of face processing.The second part of the thesis explores the temporal information embodied in dynamic facial expressions. Using expressions of increasing and decreasing intensity that were presented in the natural or reversed frame order, we manipulate the temporal information of expression unfolding in a well controlled 2x2 Design (factor “emotion direction” and factor “timeline”). This approach allowed us to nicely control for low-level aspects. In three consecutive studies, we explore first the underlying brain activation elicited by our stimulus manipulation in healthy subjects. Second, we examine the perceptual effects caused by emotion-direction and timeline-reversal in healthy subjects, and third in autistic participants and matched controls. Our results indicate a sensitivity of all areas of the neural core network of face processing to both, emotion-direction and timeline. Behaviorally, we found that both factors affected jugdements of different stimulus properties like emotion intensity or how well emotions are performed, even if subjects were not informed of the timeline manipulation. Interestingly, autistic subjects did not differ from the control group regarding the effects caused by timeline reversal in their perceptual evaluation of the stimuli. In sum, our studies shed light onto two key aspects of facial processing and perception - holistic or part-based processing and facial dynamics - that have not been addressed before in the way done here

    Advances in Emotion Recognition: Link to Depressive Disorder

    Get PDF
    Emotion recognition enables real-time analysis, tagging, and inference of cognitive affective states from human facial expression, speech and tone, body posture and physiological signal, as well as social text on social network platform. Recognition of emotion pattern based on explicit and implicit features extracted through wearable and other devices could be decoded through computational modeling. Meanwhile, emotion recognition and computation are critical to detection and diagnosis of potential patients of mood disorder. The chapter aims to summarize the main findings in the area of affective recognition and its applications in major depressive disorder (MDD), which have made rapid progress in the last decade

    Enhanced facial expression using oxygenation absorption of facial skin

    Get PDF
    Facial skin appearance is affected by physical and physiological state of the skin. The facial expression especially the skin appearances are in constant mutability and dynamically changed as human behave, talk and stress. The color of skin is considered to be one of the key indicators for these symptoms. The skin color resolution is highly determined by the scattering and absorption of light within the skin layers. The concentration of chromophores in melanin and hemoglobin oxygenation in the blood plays a pivotal role. An improvement work on prior model to create a realistic textured three-dimensional (3D) facial model for animation is proposed. This thesis considers both surface and subsurface scattering capable of simulating the interaction of light with the human skin. Furthermore, six parameters are used in this research which are the amount of oxygenation, de-oxygenation, hemoglobin, melanin, oil and blend factor for different types of melanin in the skin to generate a perfect match to specific skin types. The proposed model is associated with Blend Shape Interpolation and Facial Action Coding System to create five basic facial emotional expressions namely anger, happy, neutral, sad and fear. Meanwhile, the correlation between blood oxygenation in changing facial skin color for basic natural emotional expressions are measured using the Pulse Oximetry and 3D skin analyzer. The data from different subjects with male and female under different number of partially extreme facial expressions are fed in the model for simulation. The multi-pole method for layered materials is used to calculate the spectral diffusion profiles of two-layered skin which are further utilized to simulate the subsurface scattering of light within the skin. While the subsurface scattering is further combined with the Torrance-Sparrow Bidirectional Reflectance Distribution Function (BRDF) model to simulate the interaction of light with an oily layer at the skin surface. The result is validated by an evaluation procedure for measuring the accountability of a facial model via expressions and skin color of proposed model to the real human. The facial expressions evaluation is verified by calculating Euclidean distance between the facial markers of the real human and the avatar. The second assessment validates the skin color of facial expressions for the proposed avatar via the extraction of Histogram Color Features and Color Coherence Vector of each image with the real human and the previous work. The experimental result shows around 5.12 percent improvement compared to previous work. In achieving the realistic facial expression for virtual human based on facial skin color, texture and oxygenation of hemoglobin, the result demonstrates that the proposed model is beneficial to the development of virtual reality and game environment of computer aided graphics animation systems
    corecore