597 research outputs found

    Every Smile is Unique: Landmark-Guided Diverse Smile Generation

    Full text link
    Each smile is unique: one person surely smiles in different ways (e.g., closing/opening the eyes or mouth). Given one input image of a neutral face, can we generate multiple smile videos with distinctive characteristics? To tackle this one-to-many video generation problem, we propose a novel deep learning architecture named Conditional Multi-Mode Network (CMM-Net). To better encode the dynamics of facial expressions, CMM-Net explicitly exploits facial landmarks for generating smile sequences. Specifically, a variational auto-encoder is used to learn a facial landmark embedding. This single embedding is then exploited by a conditional recurrent network which generates a landmark embedding sequence conditioned on a specific expression (e.g., spontaneous smile). Next, the generated landmark embeddings are fed into a multi-mode recurrent landmark generator, producing a set of landmark sequences still associated to the given smile class but clearly distinct from each other. Finally, these landmark sequences are translated into face videos. Our experimental results demonstrate the effectiveness of our CMM-Net in generating realistic videos of multiple smile expressions.Comment: Accepted as a poster in Conference on Computer Vision and Pattern Recognition (CVPR), 201

    A Survey of Multimedia Technologies and Robust Algorithms

    Full text link
    Multimedia technologies are now more practical and deployable in real life, and the algorithms are widely used in various researching areas such as deep learning, signal processing, haptics, computer vision, robotics, and medical multimedia processing. This survey provides an overview of multimedia technologies and robust algorithms in multimedia data processing, medical multimedia processing, human facial expression tracking and pose recognition, and multimedia in education and training. This survey will also analyze and propose a future research direction based on the overview of current robust algorithms and multimedia technologies. We want to thank the research and previous work done by the Multimedia Research Centre (MRC), the University of Alberta, which is the inspiration and starting point for future research.Comment: arXiv admin note: text overlap with arXiv:2010.1296

    Distributional Feature Mapping in Data Classification

    Get PDF
    Performance of a machine learning algorithm depends on the representation of the input data. In computer vision problems, histogram based feature representation has significantly improved the classification tasks. L1 normalized histograms can be modelled by Dirichlet and related distributions to transform input space to feature space. We propose a mapping technique that contains prior knowledge about the distribution of the data and increases the discriminative power of the classifiers in supervised learning such as Support Vector Machine (SVM). The mapping technique for proportional data which is based on Dirichlet, Generalized Dirichlet, Beta Liouville, scaled Dirichlet and shifted scaled Dirichlet distributions can be incorporated with traditional kernels to improve the base kernels accuracy. Experimental results show that the proposed technique for proportional data increases accuracy for machine vision tasks such as natural scene recognition, satellite image classification, gender classification, facial expression recognition and human action recognition in videos. In addition, in object tracking, learning parametric features of the target object using Dirichlet and related distributions may help to capture representations invariant to noise. This further motivated our study of such distributions in object tracking. We propose a framework for feature representation on probability simplex for proportional data utilizing the histogram representation of the target object at initial frame. A set of parameter vectors determine the appearance features of the target object in the subsequent frames. Motivated by the success of distribution based feature mapping for proportional data, we extend this technique for semi-bounded data utilizing inverted Dirichlet, generalized inverted Dirichlet and inverted Beta Liouville distributions. Similar approach is taken into account for count data where Dirichlet multinomial and generalized Dirichlet multinomial distributions are used to map density features with input features

    Facial expression recognition in the wild : from individual to group

    Get PDF
    The progress in computing technology has increased the demand for smart systems capable of understanding human affect and emotional manifestations. One of the crucial factors in designing systems equipped with such intelligence is to have accurate automatic Facial Expression Recognition (FER) methods. In computer vision, automatic facial expression analysis is an active field of research for over two decades now. However, there are still a lot of questions unanswered. The research presented in this thesis attempts to address some of the key issues of FER in challenging conditions mentioned as follows: 1) creating a facial expressions database representing real-world conditions; 2) devising Head Pose Normalisation (HPN) methods which are independent of facial parts location; 3) creating automatic methods for the analysis of mood of group of people. The central hypothesis of the thesis is that extracting close to real-world data from movies and performing facial expression analysis on movies is a stepping stone in the direction of moving the analysis of faces towards real-world, unconstrained condition. A temporal facial expressions database, Acted Facial Expressions in the Wild (AFEW) is proposed. The database is constructed and labelled using a semi-automatic process based on closed caption subtitle based keyword search. Currently, AFEW is the largest facial expressions database representing challenging conditions available to the research community. For providing a common platform to researchers in order to evaluate and extend their state-of-the-art FER methods, the first Emotion Recognition in the Wild (EmotiW) challenge based on AFEW is proposed. An image-only based facial expressions database Static Facial Expressions In The Wild (SFEW) extracted from AFEW is proposed. Furthermore, the thesis focuses on HPN for real-world images. Earlier methods were based on fiducial points. However, as fiducial points detection is an open problem for real-world images, HPN can be error-prone. A HPN method based on response maps generated from part-detectors is proposed. The proposed shape-constrained method does not require fiducial points and head pose information, which makes it suitable for real-world images. Data from movies and the internet, representing real-world conditions poses another major challenge of the presence of multiple subjects to the research community. This defines another focus of this thesis where a novel approach for modeling the perception of mood of a group of people in an image is presented. A new database is constructed from Flickr based on keywords related to social events. Three models are proposed: averaging based Group Expression Model (GEM), Weighted Group Expression Model (GEM_w) and Augmented Group Expression Model (GEM_LDA). GEM_w is based on social contextual attributes, which are used as weights on each person's contribution towards the overall group's mood. Further, GEM_LDA is based on topic model and feature augmentation. The proposed framework is applied to applications of group candid shot selection and event summarisation. The application of Structural SIMilarity (SSIM) index metric is explored for finding similar facial expressions. The proposed framework is applied to the problem of creating image albums based on facial expressions, finding corresponding expressions for training facial performance transfer algorithms

    Graphical models for visual object recognition and tracking

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 277-301).We develop statistical methods which allow effective visual detection, categorization, and tracking of objects in complex scenes. Such computer vision systems must be robust to wide variations in object appearance, the often small size of training databases, and ambiguities induced by articulated or partially occluded objects. Graphical models provide a powerful framework for encoding the statistical structure of visual scenes, and developing corresponding learning and inference algorithms. In this thesis, we describe several models which integrate graphical representations with nonparametric statistical methods. This approach leads to inference algorithms which tractably recover high-dimensional, continuous object pose variations, and learning procedures which transfer knowledge among related recognition tasks. Motivated by visual tracking problems, we first develop a nonparametric extension of the belief propagation (BP) algorithm. Using Monte Carlo methods, we provide general procedures for recursively updating particle-based approximations of continuous sufficient statistics. Efficient multiscale sampling methods then allow this nonparametric BP algorithm to be flexibly adapted to many different applications.(cont.) As a particular example, we consider a graphical model describing the hand's three-dimensional (3D) structure, kinematics, and dynamics. This graph encodes global hand pose via the 3D position and orientation of several rigid components, and thus exposes local structure in a high-dimensional articulated model. Applying nonparametric BP, we recover a hand tracking algorithm which is robust to outliers and local visual ambiguities. Via a set of latent occupancy masks, we also extend our approach to consistently infer occlusion events in a distributed fashion. In the second half of this thesis, we develop methods for learning hierarchical models of objects, the parts composing them, and the scenes surrounding them. Our approach couples topic models originally developed for text analysis with spatial transformations, and thus consistently accounts for geometric constraints. By building integrated scene models, we may discover contextual relationships, and better exploit partially labeled training images. We first consider images of isolated objects, and show that sharing parts among object categories improves accuracy when learning from few examples.(cont.) Turning to multiple object scenes, we propose nonparametric models which use Dirichlet processes to automatically learn the number of parts underlying each object category, and objects composing each scene. Adapting these transformed Dirichlet processes to images taken with a binocular stereo camera, we learn integrated, 3D models of object geometry and appearance. This leads to a Monte Carlo algorithm which automatically infers 3D scene structure from the predictable geometry of known object categories.by Erik B. Sudderth.Ph.D

    Learning Extremal Representations with Deep Archetypal Analysis

    Full text link
    Archetypes are typical population representatives in an extremal sense, where typicality is understood as the most extreme manifestation of a trait or feature. In linear feature space, archetypes approximate the data convex hull allowing all data points to be expressed as convex mixtures of archetypes. However, it might not always be possible to identify meaningful archetypes in a given feature space. Learning an appropriate feature space and identifying suitable archetypes simultaneously addresses this problem. This paper introduces a generative formulation of the linear archetype model, parameterized by neural networks. By introducing the distance-dependent archetype loss, the linear archetype model can be integrated into the latent space of a variational autoencoder, and an optimal representation with respect to the unknown archetypes can be learned end-to-end. The reformulation of linear Archetypal Analysis as deep variational information bottleneck, allows the incorporation of arbitrarily complex side information during training. Furthermore, an alternative prior, based on a modified Dirichlet distribution, is proposed. The real-world applicability of the proposed method is demonstrated by exploring archetypes of female facial expressions while using multi-rater based emotion scores of these expressions as side information. A second application illustrates the exploration of the chemical space of small organic molecules. In this experiment, it is demonstrated that exchanging the side information but keeping the same set of molecules, e. g. using as side information the heat capacity of each molecule instead of the band gap energy, will result in the identification of different archetypes. As an application, these learned representations of chemical space might reveal distinct starting points for de novo molecular design.Comment: Under review for publication at the International Journal of Computer Vision (IJCV). Extended version of our GCPR2019 paper "Deep Archetypal Analysis

    A Comparative Emotions-detection Review for Non-intrusive Vision-Based Facial Expression Recognition

    Get PDF
    Affective computing advocates for the development of systems and devices that can recognize, interpret, process, and simulate human emotion. In computing, the field seeks to enhance the user experience by finding less intrusive automated solutions. However, initiatives in this area focus on solitary emotions that limit the scalability of the approaches. Further reviews conducted in this area have also focused on solitary emotions, presenting challenges to future researchers when adopting these recommendations. This review aims at highlighting gaps in the application areas of Facial Expression Recognition Techniques by conducting a comparative analysis of various emotion detection datasets, algorithms, and results provided in existing studies. The systematic review adopted the PRISMA model and analyzed eighty-three publications. Findings from the review show that different emotions call for different Facial Expression Recognition techniques, which should be analyzed when conducting Facial Expression Recognition. Keywords: Facial Expression Recognition, Emotion Detection, Image Processing, Computer Visio
    • …
    corecore