13 research outputs found

    A Comprehensive Performance Evaluation of Deformable Face Tracking "In-the-Wild"

    Full text link
    Recently, technologies such as face detection, facial landmark localisation and face recognition and verification have matured enough to provide effective and efficient solutions for imagery captured under arbitrary conditions (referred to as "in-the-wild"). This is partially attributed to the fact that comprehensive "in-the-wild" benchmarks have been developed for face detection, landmark localisation and recognition/verification. A very important technology that has not been thoroughly evaluated yet is deformable face tracking "in-the-wild". Until now, the performance has mainly been assessed qualitatively by visually assessing the result of a deformable face tracking technology on short videos. In this paper, we perform the first, to the best of our knowledge, thorough evaluation of state-of-the-art deformable face tracking pipelines using the recently introduced 300VW benchmark. We evaluate many different architectures focusing mainly on the task of on-line deformable face tracking. In particular, we compare the following general strategies: (a) generic face detection plus generic facial landmark localisation, (b) generic model free tracking plus generic facial landmark localisation, as well as (c) hybrid approaches using state-of-the-art face detection, model free tracking and facial landmark localisation technologies. Our evaluation reveals future avenues for further research on the topic.Comment: E. Antonakos and P. Snape contributed equally and have joint second authorshi

    300 faces in-the-wild challenge: database and results

    Get PDF
    Computer Vision has recently witnessed great research advance towards automatic facial points detection. Numerous methodologies have been proposed during the last few years that achieve accurate and efficient performance. However, fair comparison between these methodologies is infeasible mainly due to two issues. (a) Most existing databases, captured under both constrained and unconstrained (in-the-wild) conditions have been annotated using different mark-ups and, in most cases, the accuracy of the annotations is low. (b) Most published works report experimental results using different training/testing sets, different error metrics and, of course, landmark points with semantically different locations. In this paper, we aim to overcome the aforementioned problems by (a) proposing a semi-automatic annotation technique that was employed to re-annotate most existing facial databases under a unified protocol, and (b) presenting the 300 Faces In-The-Wild Challenge (300-W), the first facial landmark localization challenge that was organized twice, in 2013 and 2015. To the best of our knowledge, this is the first effort towards a unified annotation scheme of massive databases and a fair experimental comparison of existing facial landmark localization systems. The images and annotations of the new testing database that was used in the 300-W challenge are available from http://ibug.doc.ic.ac.uk/resources/facial-point-annotations

    HOG active appearance models

    Get PDF
    We propose the combination of dense Histogram of Oriented Gradients (HOG) features with Active Appearance Models (AAMs). We employ the efficient Inverse Compositional optimization technique and show results for the task of face fitting. By taking advantage of the descriptive characteristics of HOG features, we build robust and accurate AAMs that generalize well to unseen faces with illumination, identity, pose and occlusion variations. Our experiments on challenging in-the-wild databases show that HOG AAMs significantly outperform current state-of-the-art results of discriminative methods trained on larger databases

    Feature-based Lucas-Kanade and Active Appearance Models

    Get PDF
    Lucas-Kanade and Active Appearance Models are among the most commonly used methods for image alignment and facial fitting, respectively. They both utilize non-linear gradient descent, which is usually applied on intensity values. In this paper, we propose the employment of highly-descriptive, densely-sampled image features for both problems. We show that the strategy of warping the multi-channel dense feature image at each iteration is more beneficial than extracting features after warping the intensity image at each iteration. Motivated by this observation, we demonstrate robust and accurate alignment and fitting performance using a variety of powerful feature descriptors. Especially with the employment of HOG and SIFT features, our method significantly outperforms the current state-of-the-art results on in-the-wild databases

    3D face morphable models "In-The-Wild"

    Get PDF
    3D Morphable Models (3DMMs) are powerful statistical models of 3D facial shape and texture, and among the state-of-the-art methods for reconstructing facial shape from single images. With the advent of new 3D sensors, many 3D facial datasets have been collected containing both neutral as well as expressive faces. However, all datasets are captured under controlled conditions. Thus, even though powerful 3D facial shape models can be learnt from such data, it is difficult to build statistical texture models that are sufficient to reconstruct faces captured in unconstrained conditions (in-the-wild). In this paper, we propose the first, to the best of our knowledge, in-the-wild 3DMM by combining a powerful statistical model of facial shape, which describes both identity and expression, with an in-the-wild texture model. We show that the employment of such an in-the-wild texture model greatly simplifies the fitting procedure, because there is no need to optimise with regards to the illumination parameters. Furthermore, we propose a new fast algorithm for fitting the 3DMM in arbitrary images. Finally, we have captured the first 3D facial database with relatively unconstrained conditions and report quantitative evaluations with state-of-the-art performance. Complementary qualitative reconstruction results are demonstrated on standard in-the-wild facial databases

    Robust statistical deformable models

    Get PDF
    During the last few years, we have witnessed tremendous advances in the field of 2D Deformable Models for the problem of landmark localization. These advances, which are mainly reported on the task of face alignment, have created two major and opposing families of methodologies. On the one hand, there are the generative Deformable Models that utilize a Newton-type optimization. This family of techniques has attracted extensive research effort during the last two decades, but has lately been criticized of achieving inaccurate performance. On the other hand, there is the currently predominant family of discriminative Deformable Models that treat the problem of landmark localization as a regression problem. These techniques commonly employ cascaded linear regression and have proved to be very accurate. In this thesis, we argue that even though generative Deformable Models are less accurate than discriminative, they are still very valuable for several tasks. In the first part of the thesis, we propose two novel generative Deformable Models. In the second part of the thesis, we show that the combination of generative and discriminative Deformable Models achieves state-of-the-art results on the tasks of (i) landmark localization and (ii) semi-supervised annotation of large visual data.Open Acces

    Visual modeling of human face in real time with applications in recognition

    No full text
    142 σ.Η αναγνώριση ανθρώπινων προσώπων από στατικές εικόνες ή βίντεο στοχεύει στην λήψη αυτόματων αποφάσεων για την ύπαρξη ανθρώπων σε μια σκηνή, τη θέση τους, την ταυτοποίηση τους και σηματοδοτεί γεγονότα όπως για παράδειγμα ομιλία, διάλογοι, δράσεις, χειρονομίες, αφηγηματικά περιστατικά κ.λπ. Μια ιδιαίτερη υποκατηγορία του γενικότερου προβλήματος είναι η μοντελοποίηση και αναγνώριση των εκφράσεων του προσώπου με εφαρμογές στις περιοχές της αναγνώρισης φωνής, μελέτης συμπεριφοράς, αναγνώρισης δράσεων, επικοινωνίας ανθρώπου-ρομπότ, γραφικής με υπολογιστές και συναισθηματικής υπολογιστικής (συναίσθηση, ανίχνευση και ερμηνεία των ανθρώπινων συναισθηματικών καταστάσεων). Οι εκφράσεις του προσώπου αποτελούν την οπτική εκδήλωση της συναισθηματικής κατάστασης, της γνωσιακής δραστηριότητας, της πρόθεσης, της προσωπικότητας ή της ψυχολογικής κατάστασης. Για την αυτόματη αναγνώριση το μεγαλύτερο τμήμα της τρέχουσας βιβλιογραφίας έχει εμπνευστεί από το Σύστημα Κωδικοποίησης Δράσεων του Προσώπου (Facial Action Coding System, FACS) που εισάχθηκε στην συμπεριφοριστική επιστήμη από τους Ekman και Friesen. Βασίζεται σε ένα πρωτότυπο των βασικών ανθρώπινων εκφράσεων και επιτρέπει την μελέτη τους με βάση την ανατομική ανάλυση των κινήσεων του προσώπου. Στόχος της διπλωματικής είναι η ανάπτυξη τεχνικών και αλγορίθμων στην κατεύθυνση της αυτόματης ανάλυσης και αναγνώρισης δράσεων του προσώπου (facial actions) με χρήση τεχνικών της Όρασης Υπολογιστών. Η προσπάθεια μπορεί να έχει ως αφετηρία τη βελτίωση της επίδοσης τεχνικών βασισμένων σε ειδικά διαμορφωμένα, αντικειμενοστραφή μοντέλα όπως είναι τα Μοντέλα Ενεργής Εμφάνισης (Active Appearance Models, AAMs). Έμφαση θα δοθεί στην ανάπτυξη μηχανισμών προσαρμογής και την εξαγωγή περιγραφέων ανεξάρτητων της ταυτότητας του προσώπου, με σκοπό την ανάδειξη χαρακτηριστικών της καθολικότητας των εκφράσεων του ανθρώπινου προσώπου. Πιθανές εφαρμογές περιλαμβάνουν την αναγνώριση νοηματικής γλώσσας, τη σύνθεση φωνής με συναισθηματική χροιά, και την εξαγωγή μοντέλων συναισθηματικής προσοχής και σημαντικότητας από δεδομένα ταινιών.Face recognition on static images or video sequences makes decisions automatically about the existence of human people in a scene, their positions, their identities and marks events such as speech, dialogues, actions , gestures, narrative events etc. A special subcategory of the general problem is modeling and automatic recognition of facial expressions, with applications in the regions of voice recognition, behaviour study, actions recognition, human-robot interaction, graphics with computers and emotional computing (consciousness, detection and explanation of human emotional states). Facial expressions are the visual demonstration of emotional state, cognitive activity, intention, personality or psychological state. For the automatic recognition, the majority of current bibliography is based on the Facial Action Coding System (FACS), which was introduced in behavioral science by Ekman and Friesen. It is based on a prototype of basic human expressions and allows their study with anatomic analysis of the movements of the face. The purpose of this diploma thesis is the development of techniques and algorithms on the direction of automatic analysis and recognition of facial actions, based on techniques of Computer Vision. The attempt may begin from the improvement of the performance of techniques based on specially designed object-oriented models such as Active Appearance Models (AAMs). There will be emphasis on the development of adaptive techniques and the extraction of descriptors independent of the face's identity, to highlight features of the universality of human facial expressions. Possible applications include recognition of sign language, voice synthesis with emotional tinge and extraction of models of emotional attention and significance from movies data.Επαμεινώνδας Π. Αντωνάκο

    DenseReg: fully convolutional dense shape regression in-the-wild

    Get PDF
    In this paper we propose to learn a mapping from image pixels into a dense template grid through a fully convolutional network. We formulate this task as a regression problem and train our network by leveraging upon manually annotated facial landmarks “in-the-wild”. We use such landmarks to establish a dense correspondence field between a three-dimensional object template and the input image, which then serves as the ground-truth for training our regression system. We show that we can combine ideas from semantic segmentation with regression networks, yielding a highly-accurate ‘quantized regression’ architecture. Our system, called DenseReg, allows us to estimate dense image-to-template correspondences in a fully convolutional manner. As such our network can provide useful correspondence information as a stand-alone system, while when used as an initialization for Statistical Deformable Models we obtain landmark localization results that largely outperform the current state-of-the-art on the challenging 300W benchmark. We thoroughly evaluate our method on a host of facial analysis tasks, and demonstrate its use for other correspondence estimation tasks, such as the human body and the human ear. DenseReg code is made available at http://alpguler.com/DenseReg.html along with supplementary materials

    Adaptive cascaded regression

    No full text
    Abstract The two predominant families of deformable models for the task of face alignment are: (i) discriminative cascaded regression models, and (ii) generative models optimised with Gauss-Newton. Although these approaches have been found to work well in practise, they each suffer from convergence issues. Cascaded regression has no theoretical guarantee of convergence to a local minimum and thus may fail to recover the fine details of the object. Gauss-Newton optimisation is not robust to initialisations that are far from the optimal solution. In this paper, we propose the first, to the best of our knowledge, attempt to combine the best of these two worlds under a unified model and report state-of-the-art performance on the most recent facial benchmark challenge
    corecore