135 research outputs found

    Synthesis and Control of High Resolution Facial Expressions for Visual Interactions

    Full text link
    The synthesis of facial expression with control of intensity and personal styles is important in intelligent and affective human-computer interaction, especially in face-to-face inter-action between human and intelligent agent. We present a facial expression animation system that facilitates control of expressiveness and style. We learn a decomposable genera-tive model for the nonlinear deformation of facial expressions by analyzing the mapping space between low dimensional embedded representation and high resolution tracking data. Bilinear analysis of the mapping space provides a compact representation of the nonlinear generative model for facial expressions. The decomposition allows synthesis of new fa-cial expressions by control of geometry and expression style. The generative model provides control of expressiveness pre-serving nonlinear deformation in the expressions with simple parameters and allows synthesis of stylized facial geometry. In addition, we can directly extract the MPEG-4 Facial Ani-mation Parameters (FAPs) from the synthesized data, which allows using any animation engine that supports FAPs to ani-mate new synthesized expressions. 1

    06241 Abstracts Collection -- Human Motion - Understanding, Modeling, Capture and Animation. 13th Workshop

    Get PDF
    From 11.06.06 to 16.06.06, the Dagstuhl Seminar 06241 ``Human Motion - Understanding, Modeling, Capture and Animation. 13th Workshop "Theoretical Foundations of Computer Vision"\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general

    Going Deeper into Action Recognition: A Survey

    Full text link
    Understanding human actions in visual data is tied to advances in complementary research areas including object recognition, human dynamics, domain adaptation and semantic segmentation. Over the last decade, human action analysis evolved from earlier schemes that are often limited to controlled environments to nowadays advanced solutions that can learn from millions of videos and apply to almost all daily activities. Given the broad range of applications from video surveillance to human-computer interaction, scientific milestones in action recognition are achieved more rapidly, eventually leading to the demise of what used to be good in a short time. This motivated us to provide a comprehensive review of the notable steps taken towards recognizing human actions. To this end, we start our discussion with the pioneering methods that use handcrafted representations, and then, navigate into the realm of deep learning based approaches. We aim to remain objective throughout this survey, touching upon encouraging improvements as well as inevitable fallbacks, in the hope of raising fresh questions and motivating new research directions for the reader

    An Efficient Boosted Classifier Tree-Based Feature Point Tracking System for Facial Expression Analysis

    Get PDF
    The study of facial movement and expression has been a prominent area of research since the early work of Charles Darwin. The Facial Action Coding System (FACS), developed by Paul Ekman, introduced the first universal method of coding and measuring facial movement. Human-Computer Interaction seeks to make human interaction with computer systems more effective, easier, safer, and more seamless. Facial expression recognition can be broken down into three distinctive subsections: Facial Feature Localization, Facial Action Recognition, and Facial Expression Classification. The first and most important stage in any facial expression analysis system is the localization of key facial features. Localization must be accurate and efficient to ensure reliable tracking and leave time for computation and comparisons to learned facial models while maintaining real-time performance. Two possible methods for localizing facial features are discussed in this dissertation. The Active Appearance Model is a statistical model describing an object\u27s parameters through the use of both shape and texture models, resulting in appearance. Statistical model-based training for object recognition takes multiple instances of the object class of interest, or positive samples, and multiple negative samples, i.e., images that do not contain objects of interest. Viola and Jones present a highly robust real-time face detection system, and a statistically boosted attentional detection cascade composed of many weak feature detectors. A basic algorithm for the elimination of unnecessary sub-frames while using Viola-Jones face detection is presented to further reduce image search time. A real-time emotion detection system is presented which is capable of identifying seven affective states (agreeing, concentrating, disagreeing, interested, thinking, unsure, and angry) from a near-infrared video stream. The Active Appearance Model is used to place 23 landmark points around key areas of the eyes, brows, and mouth. A prioritized binary decision tree then detects, based on the actions of these key points, if one of the seven emotional states occurs as frames pass. The completed system runs accurately and achieves a real-time frame rate of approximately 36 frames per second. A novel facial feature localization technique utilizing a nested cascade classifier tree is proposed. A coarse-to-fine search is performed in which the regions of interest are defined by the response of Haar-like features comprising the cascade classifiers. The individual responses of the Haar-like features are also used to activate finer-level searches. A specially cropped training set derived from the Cohn-Kanade AU-Coded database is also developed and tested. Extensions of this research include further testing to verify the novel facial feature localization technique presented for a full 26-point face model, and implementation of a real-time intensity sensitive automated Facial Action Coding System

    Towards perceptual intelligence : statistical modeling of human individual and interactive behaviors

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Architecture, 2000.Includes bibliographical references (p. 279-297).This thesis presents a computational framework for the automatic recognition and prediction of different kinds of human behaviors from video cameras and other sensors, via perceptually intelligent systems that automatically sense and correctly classify human behaviors, by means of Machine Perception and Machine Learning techniques. In the thesis I develop the statistical machine learning algorithms (dynamic graphical models) necessary for detecting and recognizing individual and interactive behaviors. In the case of the interactions two Hidden Markov Models (HMMs) are coupled in a novel architecture called Coupled Hidden Markov Models (CHMMs) that explicitly captures the interactions between them. The algorithms for learning the parameters from data as well as for doing inference with those models are developed and described. Four systems that experimentally evaluate the proposed paradigm are presented: (1) LAFTER, an automatic face detection and tracking system with facial expression recognition; (2) a Tai-Chi gesture recognition system; (3) a pedestrian surveillance system that recognizes typical human to human interactions; (4) and a SmartCar for driver maneuver recognition. These systems capture human behaviors of different nature and increasing complexity: first, isolated, single-user facial expressions, then, two-hand gestures and human-to-human interactions, and finally complex behaviors where human performance is mediated by a machine, more specifically, a car. The metric that is used for quantifying the quality of the behavior models is their accuracy: how well they are able to recognize the behaviors on testing data. Statistical machine learning usually suffers from lack of data for estimating all the parameters in the models. In order to alleviate this problem, synthetically generated data are used to bootstrap the models creating 'prior models' that are further trained using much less real data than otherwise it would be required. The Bayesian nature of the approach let us do so. The predictive power of these models lets us categorize human actions very soon after the beginning of the action. Because of the generic nature of the typical behaviors of each of the implemented systems there is a reason to believe that this approach to modeling human behavior would generalize to other dynamic human-machine systems. This would allow us to recognize automatically people's intended action, and thus build control systems that dynamically adapt to suit the human's purposes better.by Nuria M. Oliver.Ph.D

    On the Design, Implementation and Application of Novel Multi-disciplinary Techniques for explaining Artificial Intelligence Models

    Get PDF
    284 p.Artificial Intelligence is a non-stopping field of research that has experienced some incredible growth lastdecades. Some of the reasons for this apparently exponential growth are the improvements incomputational power, sensing capabilities and data storage which results in a huge increment on dataavailability. However, this growth has been mostly led by a performance-based mindset that has pushedmodels towards a black-box nature. The performance prowess of these methods along with the risingdemand for their implementation has triggered the birth of a new research field. Explainable ArtificialIntelligence. As any new field, XAI falls short in cohesiveness. Added the consequences of dealing withconcepts that are not from natural sciences (explanations) the tumultuous scene is palpable. This thesiscontributes to the field from two different perspectives. A theoretical one and a practical one. The formeris based on a profound literature review that resulted in two main contributions: 1) the proposition of anew definition for Explainable Artificial Intelligence and 2) the creation of a new taxonomy for the field.The latter is composed of two XAI frameworks that accommodate in some of the raging gaps found field,namely: 1) XAI framework for Echo State Networks and 2) XAI framework for the generation ofcounterfactual. The first accounts for the gap concerning Randomized neural networks since they havenever been considered within the field of XAI. Unfortunately, choosing the right parameters to initializethese reservoirs falls a bit on the side of luck and past experience of the scientist and less on that of soundreasoning. The current approach for assessing whether a reservoir is suited for a particular task is toobserve if it yields accurate results, either by handcrafting the values of the reservoir parameters or byautomating their configuration via an external optimizer. All in all, this poses tough questions to addresswhen developing an ESN for a certain application, since knowing whether the created structure is optimalfor the problem at hand is not possible without actually training it. However, some of the main concernsfor not pursuing their application is related to the mistrust generated by their black-box" nature. Thesecond presents a new paradigm to treat counterfactual generation. Among the alternatives to reach auniversal understanding of model explanations, counterfactual examples is arguably the one that bestconforms to human understanding principles when faced with unknown phenomena. Indeed, discerningwhat would happen should the initial conditions differ in a plausible fashion is a mechanism oftenadopted by human when attempting at understanding any unknown. The search for counterfactualsproposed in this thesis is governed by three different objectives. Opposed to the classical approach inwhich counterfactuals are just generated following a minimum distance approach of some type, thisframework allows for an in-depth analysis of a target model by means of counterfactuals responding to:Adversarial Power, Plausibility and Change Intensity

    Machine Learning Methods for Social Signal Processing

    Get PDF

    Bridging the gap between reconstruction and synthesis

    Get PDF
    Aplicat embargament des de la data de defensa fins el 15 de gener de 20223D reconstruction and image synthesis are two of the main pillars in computer vision. Early works focused on simple tasks such as multi-view reconstruction and texture synthesis. With the spur of Deep Learning, the field has rapidly progressed, making it possible to achieve more complex and high level tasks. For example, the 3D reconstruction results of traditional multi-view approaches are currently obtained with single view methods. Similarly, early pattern based texture synthesis works have resulted in techniques that allow generating novel high-resolution images. In this thesis we have developed a hierarchy of tools that cover all these range of problems, lying at the intersection of computer vision, graphics and machine learning. We tackle the problem of 3D reconstruction and synthesis in the wild. Importantly, we advocate for a paradigm in which not everything should be learned. Instead of applying Deep Learning naively we propose novel representations, layers and architectures that directly embed prior 3D geometric knowledge for the task of 3D reconstruction and synthesis. We apply these techniques to problems including scene/person reconstruction and photo-realistic rendering. We first address methods to reconstruct a scene and the clothed people in it while estimating the camera position. Then, we tackle image and video synthesis for clothed people in the wild. Finally, we bridge the gap between reconstruction and synthesis under the umbrella of a unique novel formulation. Extensive experiments conducted along this thesis show that the proposed techniques improve the performance of Deep Learning models in terms of the quality of the reconstructed 3D shapes / synthesised images, while reducing the amount of supervision and training data required to train them. In summary, we provide a variety of low, mid and high level algorithms that can be used to incorporate prior knowledge into different stages of the Deep Learning pipeline and improve performance in tasks of 3D reconstruction and image synthesis.La reconstrucció 3D i la síntesi d'imatges són dos dels pilars fonamentals en visió per computador. Els estudis previs es centren en tasques senzilles com la reconstrucció amb informació multi-càmera i la síntesi de textures. Amb l'aparició del "Deep Learning", aquest camp ha progressat ràpidament, fent possible assolir tasques molt més complexes. Per exemple, per obtenir una reconstrucció 3D, tradicionalment s'utilitzaven mètodes multi-càmera, en canvi ara, es poden obtenir a partir d'una sola imatge. De la mateixa manera, els primers treballs de síntesi de textures basats en patrons han donat lloc a tècniques que permeten generar noves imatges completes en alta resolució. En aquesta tesi, hem desenvolupat una sèrie d'eines que cobreixen tot aquest ventall de problemes, situats en la intersecció entre la visió per computador, els gràfics i l'aprenentatge automàtic. Abordem el problema de la reconstrucció i la síntesi 3D en el món real. És important destacar que defensem un paradigma on no tot s'ha d'aprendre. Enlloc d'aplicar el "Deep Learning" de forma naïve, proposem representacions novedoses i arquitectures que incorporen directament els coneixements geomètrics ja existents per a aconseguir la reconstrucció 3D i la síntesi d'imatges. Nosaltres apliquem aquestes tècniques a problemes com ara la reconstrucció d'escenes/persones i a la renderització d'imatges fotorealistes. Primer abordem els mètodes per reconstruir una escena, les persones vestides que hi ha i la posició de la càmera. A continuació, abordem la síntesi d'imatges i vídeos de persones vestides en situacions quotidianes. I finalment, aconseguim, a través d'una nova formulació única, connectar la reconstrucció amb la síntesi. Els experiments realitzats al llarg d'aquesta tesi demostren que les tècniques proposades milloren el rendiment dels models de "Deepp Learning" pel que fa a la qualitat de les reconstruccions i les imatges sintetitzades alhora que redueixen la quantitat de dades necessàries per entrenar-los. En resum, proporcionem una varietat d'algoritmes de baix, mitjà i alt nivell que es poden utilitzar per incorporar els coneixements previs a les diferents etapes del "Deep Learning" i millorar el rendiment en tasques de reconstrucció 3D i síntesi d'imatges.Postprint (published version
    corecore