80 research outputs found

    Facial Component Detection in Thermal Imagery

    Get PDF
    This paper studies the problem of detecting facial components in thermal imagery (specifically eyes, nostrils and mouth). One of the immediate goals is to enable the automatic registration of facial thermal images. The detection of eyes and nostrils is performed using Haar features and the GentleBoost algorithm, which are shown to provide superior detection rates. The detection of the mouth is based on the detections of the eyes and the nostrils and is performed using measures of entropy and self similarity. The results show that reliable facial component detection is feasible using this methodology, getting a correct detection rate for both eyes and nostrils of 0.8. A correct eyes and nostrils detection enables a correct detection of the mouth in 65% of closed-mouth test images and in 73% of open-mouth test images

    Facial Point Detection using Boosted Regression and Graph Models

    Get PDF
    Finding fiducial facial points in any frame of a video showing rich naturalistic facial behaviour is an unsolved problem. Yet this is a crucial step for geometric-featurebased facial expression analysis, and methods that use appearance-based features extracted at fiducial facial point locations. In this paper we present a method based on a combination of Support Vector Regression and Markov Random Fields to drastically reduce the time needed to search for a point’s location and increase the accuracy and robustness of the algorithm. Using Markov Random Fields allows us to constrain the search space by exploiting the constellations that facial points can form. The regressors on the other hand learn a mapping between the appearance of the area surrounding a point and the positions of these points, which makes detection of the points very fast and can make the algorithm robust to variations of appearance due to facial expression and moderate changes in head pose. The proposed point detection algorithm was tested on 1855 images, the results of which showed we outperform current state of the art point detectors

    Learning Disentangled Representations with Reference-Based Variational Autoencoders

    Get PDF
    Learning disentangled representations from visual data, where different high-level generative factors are independently encoded, is of importance for many computer vision tasks. Solving this problem, however, typically requires to explicitly label all the factors of interest in training images. To alleviate the annotation cost, we introduce a learning setting which we refer to as "reference-based disentangling". Given a pool of unlabeled images, the goal is to learn a representation where a set of target factors are disentangled from others. The only supervision comes from an auxiliary "reference set" containing images where the factors of interest are constant. In order to address this problem, we propose reference-based variational autoencoders, a novel deep generative model designed to exploit the weak-supervision provided by the reference set. By addressing tasks such as feature learning, conditional image generation or attribute transfer, we validate the ability of the proposed model to learn disentangled representations from this minimal form of supervision

    An Appearance-Based Method for Parametric Video Registration

    Get PDF
    In this paper we address the problem of multi frame video registration using the combination of an appearance-based technique and a parametric model of the transformations. This technique uses an image that is selected as reference frame, and therefore, estimates the transformation that occurred to each frame in the sequence respect to this absolute referenced one. Both global and local information are employed to the estimation of these registered images. Global information is applied in terms of linear appearance subspace constraints, under the subspace constancy assumption [4], where variabilities of each frame respect to the reference frame are encoded. Local information is used by means of a polynomial parametric model that estimates the velocities field evoluton in each frame. The objective function to be minimized considers both issues at the same time, i.e., the appearance representation and the time evolution across the sequence. This function is the connection between the global coordinates in the subspace representation and the time evolution and the parametric optical flow estimates. Thus, the appearance constraints result to take into account al the images in a sequence in order to estimate the transformation parameters

    Leveraging feature uncertainty in the PnP problem

    Get PDF
    Trabajo presentado a la 25th British Machine Vision Conference (BMVC), celebrada en Nottingham (UK) del 1 al 5 de septiembre de 2014.-- Este ítem (excepto textos e imágenes no creados por el autor) está sujeto a una licencia de Creative Commons: Attribution-NonCommercial-NoDerivs 3.0 Spain.We propose a real-time and accurate solution to the Perspective-n-Point (PnP) problem --estimating the pose of a calibrated camera from n 3D-to-2D point correspondences-- that exploits the fact that in practice the 2D position of not all 2D features is estimated with the same accuracy. Assuming a model of such feature uncertainties is known in advance, we reformulate the PnP problem as a maximum likelihood minimization approximated by an unconstrained Sampson error function, which naturally penalizes the most noisy correspondences. The advantages of this approach are clearly demonstrated in synthetic experiments where feature uncertainties are exactly known. Pre-estimating the features uncertainties in real experiments is, though, not easy. In this paper we model feature uncertainty as 2D Gaussian distributions representing the sensitivity of the 2D feature detectors to different camera viewpoints. When using these noise models with our PnP formulation we still obtain promising pose estimation results that outperform the most recent approaches.This work has been partially funded by Spanish government under projects DPI2011-27510, IPT-2012-0630-020000, IPT-2011-1015-430000 and CICYT grant TIN2012-39203; by the EU project ARCAS FP7-ICT-2011-28761; and by the ERA-Net Chistera project ViSen PCIN-2013-047.Peer Reviewe

    Very fast solution to the PnP problem with algebraic outlier rejection

    Get PDF
    Presentado al CVPR 2014 celebrado en Columbus, Ohio (US) del 23 al 28 de junio.We propose a real-time, robust to outliers and accurate solution to the Perspective-n-Point (PnP) problem. The main advantages of our solution are twofold: first, it integrates the outlier rejection within the pose estimation pipeline with a negligible computational overhead; and second, its scalability to arbitrarily large number of correspondences. Given a set of 3D-to-2D matches, we formulate pose estimation problem as a low-rank homogeneous system where the solution lies on its 1D null space. Outlier correspondences are those rows of the linear system which perturb the null space and are progressively detected by projecting them on an iteratively estimated solution of the null space. Since our outlier removal process is based on an algebraic criterion which does not require computing the full-pose and reprojecting back all 3D points on the image plane at each step, we achieve speed gains of more than 100× compared to RANSAC strategies. An extensive experimental evaluation will show that our solution yields accurate results in situations with up to 50% of outliers, and can process more than 1000 correspondences in less than 5ms.This work has been partially funded by Spanish government under projects DPI2011-27510, IPT-2012-0630-020000, IPT-2011-1015-430000 and CICYT grant TIN2012-39203; by the EU project ARCAS FP7-ICT-2011-28761; and by the ERA-Net Chistera project ViSen PCIN-2013-047Peer Reviewe

    Latent-based adversarial neural networks for facial affect estimations

    Get PDF
    Comunicació presentada al 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), celebrat del 16 al 20 de novembre de 2020 a Buenos Aires, Argentina.There is a growing interest in affective computing research nowadays given its crucial role in bridging humans with computers. This progress has recently been accelerated due to the emergence of bigger dataset. One recent advance in this field is the use of adversarial learning to improve model learning through augmented samples. However, the use of latent features, which is feasible through adversarial learning, is not largely explored, yet. This technique may also improve the performance of affective models, as analogously demonstrated in related fields, such as computer vision. To expand this analysis, in this work, we explore the use of latent features through our proposed adversarial-based networks for valence and arousal recognition in the wild. Specifically, our models operate by aggregating several modalities to our discriminator, which is further conditioned to the extracted latent features by the generator. Our experiments on the recently released SEWA dataset suggest the progressive improvements of our results. Finally, we show our competitive results on the Affective Behavior Analysis in-the-Wild (ABAW) challenge dataset.This work is partly supported by the Spanish Ministry of Economy and Competitiveness under project grant TIN2017- 90124-P, the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502), and the donation bahi2018-19 to the CMTech at UPF. Further funding has been received from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 826506 (sustAGE)

    Machine Learning-based Lie Detector applied to a Novel Annotated Game Dataset

    Full text link
    Lie detection is considered a concern for everyone in their day to day life given its impact on human interactions. Thus, people normally pay attention to both what their interlocutors are saying and also to their visual appearances, including faces, to try to find any signs that indicate whether the person is telling the truth or not. While automatic lie detection may help us to understand this lying characteristics, current systems are still fairly limited, partly due to lack of adequate datasets to evaluate their performance in realistic scenarios. In this work, we have collected an annotated dataset of facial images, comprising both 2D and 3D information of several participants during a card game that encourages players to lie. Using our collected dataset, We evaluated several types of machine learning-based lie detectors in terms of their generalization, person-specific and cross-domain experiments. Our results show that models based on deep learning achieve the best accuracy, reaching up to 57\% for the generalization task and 63\% when dealing with a single participant. Finally, we also highlight the limitation of the deep learning based lie detector when dealing with cross-domain lie detection tasks

    Learning Disentangled Representations with Reference-Based Variational Autoencoders

    Get PDF
    International audienceLearning disentangled representations from visual data, where different high-level generative factors are independently encoded, is of importance for many computer vision tasks. Solving this problem, however, typically requires to explicitly label all the factors of interest in training images. To alleviate the annotation cost, we introduce a learning setting which we refer to as reference-based disentangling. Given a pool of unlabelled images, the goal is to learn a representation where a set of target factors are disentangled from others. The only supervision comes from an auxiliary reference set containing images where the factors of interest are constant. In order to address this problem, we propose reference-based variational autoencoders, a novel deep generative model designed to exploit the weak-supervision provided by the reference set. By addressing tasks such as feature learning, conditional image generation or attribute transfer, we validate the ability of the proposed model to learn disentangled representations from this minimal form of supervision
    corecore