61 research outputs found

    Towards spatial and temporal analysis of facial expressions in 3D data

    Get PDF
    Facial expressions are one of the most important means for communication of emotions and meaning. They are used to clarify and give emphasis, to express intentions, and form a crucial part of any human interaction. The ability to automatically recognise and analyse expressions could therefore prove to be vital in human behaviour understanding, which has applications in a number of areas such as psychology, medicine and security. 3D and 4D (3D+time) facial expression analysis is an expanding field, providing the ability to deal with problems inherent to 2D images, such as out-of-plane motion, head pose, and lighting and illumination issues. Analysis of data of this kind requires extending successful approaches applied to the 2D problem, as well as the development of new techniques. The introduction of recent new databases containing appropriate expression data, recorded in 3D or 4D, has allowed research into this exciting area for the first time. This thesis develops a number of techniques, both in 2D and 3D, that build towards a complete system for analysis of 4D expressions. Suitable feature types, designed by employing binary pattern methods, are developed for analysis of 3D facial geometry data. The full dynamics of 4D expressions are modelled, through a system reliant on motion-based features, to demonstrate how the different components of the expression (neutral-onset-apex-offset) can be distinguished and harnessed. Further, the spatial structure of expressions is harnessed to improve expression component intensity estimation in 2D videos. Finally, it is discussed how this latter step could be extended to 3D facial expression analysis, and also combined with temporal analysis. Thus, it is demonstrated that both spatial and temporal information, when combined with appropriate 3D features, is critical in analysis of 4D expression data.Open Acces

    3D Face Modelling, Analysis and Synthesis

    Get PDF
    Human faces have always been of a special interest to researchers in the computer vision and graphics areas. There has been an explosion in the number of studies around accurately modelling, analysing and synthesising realistic faces for various applications. The importance of human faces emerges from the fact that they are invaluable means of effective communication, recognition, behaviour analysis, conveying emotions, etc. Therefore, addressing the automatic visual perception of human faces efficiently could open up many influential applications in various domains, e.g. virtual/augmented reality, computer-aided surgeries, security and surveillance, entertainment, and many more. However, the vast variability associated with the geometry and appearance of human faces captured in unconstrained videos and images renders their automatic analysis and understanding very challenging even today. The primary objective of this thesis is to develop novel methodologies of 3D computer vision for human faces that go beyond the state of the art and achieve unprecedented quality and robustness. In more detail, this thesis advances the state of the art in 3D facial shape reconstruction and tracking, fine-grained 3D facial motion estimation, expression recognition and facial synthesis with the aid of 3D face modelling. We give a special attention to the case where the input comes from monocular imagery data captured under uncontrolled settings, a.k.a. \textit{in-the-wild} data. This kind of data are available in abundance nowadays on the internet. Analysing these data pushes the boundaries of currently available computer vision algorithms and opens up many new crucial applications in the industry. We define the four targeted vision problems (3D facial reconstruction &\& tracking, fine-grained 3D facial motion estimation, expression recognition, facial synthesis) in this thesis as the four 3D-based essential systems for the automatic facial behaviour understanding and show how they rely on each other. Finally, to aid the research conducted in this thesis, we collect and annotate a large-scale videos dataset of monocular facial performances. All of our proposed methods demonstarte very promising quantitative and qualitative results when compared to the state-of-the-art methods

    Spatio-temporal centroid based sign language facial expressions for animation synthesis in virtual environment

    Get PDF
    Orientador: Eduardo TodtTese (doutorado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa : Curitiba, 20/02/2019Inclui referências: p.86-97Área de concentração: Ciência da ComputaçãoResumo: Formalmente reconhecida como segunda lingua oficial brasileira, a BSL, ou Libras, conta hoje com muitas aplicacoes computacionais que integram a comunidade surda nas atividades cotidianas, oferecendo interpretes virtuais representados por avatares 3D construidos utilizando modelos formais que parametrizam as caracteristicas especificas das linguas de sinais. Estas aplicacoes, contudo, ainda consideram expressoes faciais como recurso de segundo plano em uma lingua primariamente gestual, ignorando a importancia que expressoes faciais e emocoes imprimem no contexto da mensagem transmitida. Neste trabalho, a fim de definir um modelo facial parametrizado para uso em linguas de sinais, um sistema de sintese de expressoes faciais atraves de um avatar 3D e proposto e um prototipo implementado. Neste sentido, um modelo de landmarks faciais separado por regioes e definido assim como uma modelagem de expressoes base utilizando as bases faciais AKDEF e JAFEE como referencia. Com este sistema e possivel representar expressoes complexas utilizando interpolacao dos valores de intensidade na animacao geometrica, de forma simplificada utilizando controle por centroides e deslocamento de regioes independentes no modelo 3D. E proposto ainda uma aplicacao de modelo espaco-temporal para os landmarks faciais, com o objetivo de observar o comportamento e relacao dos centroides na sintese das expressoes base definindo quais pontos geometricos sao relevantes no processo de interpolacao e animacao das expressoes. Um sistema de exportacao dos dados faciais seguindo o formato hierarquico utilizado na maioria dos avatares 3D interpretes de linguas de sinais e desenvolvido, incentivando a integracao em modelos formais computacionais ja existentes na literatura, permitindo ainda a adaptacao e alteracao de valores e intensidades na representacao das emocoes. Assim, os modelos e conceitos apresentados propoe a integracao de um modeo facial para representacao de expressoes na sintese de sinais oferecendo uma proposta simplificada e otimizada para aplicacao dos recursos em avatares 3D. Palavras-chave: Avatar 3D, Dados Espaco-Temporal, Libras, Lingua de sinais, Expressoes Faciais.Abstract: Formally recognized as the second official Brazilian language, BSL, or Libras, today has many computational applications that integrate the deaf community into daily activities, offering virtual interpreters represented by 3D avatars built using formal models that parameterize the specific characteristics of sign languages. These applications, however, still consider facial expressions as a background feature in a primarily gestural language, ignoring the importance that facial expressions and emotions imprint on the context of the transmitted message. In this work, in order to define a parametrized facial model for use in sign languages, a system of synthesis of facial expressions through a 3D avatar is proposed and a prototype implemented. In this way, a model of facial landmarks separated by regions is defined as a modeling of base expressions using the AKDEF and JAFEE facial bases as a reference. With this system it is possible to represent complex expressions using interpolation of the intensity values in the geometric animation, in a simplified way using control by centroids and displacement of independent regions in the 3D model. A spatial-temporal model is proposed for the facial landmarks, with the objective of define the behavior and relation of the centroids in the synthesis of the basic expressions, pointing out which geometric landmark are relevant in the process of interpolation and animation of the expressions. A system for exporting facial data following the hierarchical format used in most avatars 3D sign language interpreters is developed, encouraging the integration in formal computer models already existent in the literature, also allowing the adaptation and change of values and intensities in the representation of the emotions. Thus, the models and concepts presented propose the integration of a facial model to represent expressions in the synthesis of signals offering a simplified and optimized proposal for the application of the resources in 3D avatars. Keywords: 3D Avatar, Spatio-Temporal Data, BSL, Sign Language, Facial Expression

    {3D} Morphable Face Models -- Past, Present and Future

    No full text
    In this paper, we provide a detailed survey of 3D Morphable Face Models over the 20 years since they were first proposed. The challenges in building and applying these models, namely capture, modeling, image formation, and image analysis, are still active research topics, and we review the state-of-the-art in each of these areas. We also look ahead, identifying unsolved challenges, proposing directions for future research and highlighting the broad range of current and future applications

    Multi-Modality Human Action Recognition

    Get PDF
    Human action recognition is very useful in many applications in various areas, e.g. video surveillance, HCI (Human computer interaction), video retrieval, gaming and security. Recently, human action recognition becomes an active research topic in computer vision and pattern recognition. A number of action recognition approaches have been proposed. However, most of the approaches are designed on the RGB images sequences, where the action data was collected by RGB/intensity camera. Thus the recognition performance is usually related to various occlusion, background, and lighting conditions of the image sequences. If more information can be provided along with the image sequences, more data sources other than the RGB video can be utilized, human actions could be better represented and recognized by the designed computer vision system.;In this dissertation, the multi-modality human action recognition is studied. On one hand, we introduce the study of multi-spectral action recognition, which involves the information from different spectrum beyond visible, e.g. infrared and near infrared. Action recognition in individual spectra is explored and new methods are proposed. Then the cross-spectral action recognition is also investigated and novel approaches are proposed in our work. On the other hand, since the depth imaging technology has made a significant progress recently, where depth information can be captured simultaneously with the RGB videos. The depth-based human action recognition is also investigated. I first propose a method combining different type of depth data to recognize human actions. Then a thorough evaluation is conducted on spatiotemporal interest point (STIP) based features for depth-based action recognition. Finally, I advocate the study of fusing different features for depth-based action analysis. Moreover, human depression recognition is studied by combining facial appearance model as well as facial dynamic model

    4D (3D Dynamic) statistical models of conversational expressions and the synthesis of highly-realistic 4D facial expression sequences

    Get PDF
    In this thesis, a novel approach for modelling 4D (3D Dynamic) conversational interactions and synthesising highly-realistic expression sequences is described. To achieve these goals, a fully-automatic, fast, and robust pre-processing pipeline was developed, along with an approach for tracking and inter-subject registering 3D sequences (4D data). A method for modelling and representing sequences as single entities is also introduced. These sequences can be manipulated and used for synthesising new expression sequences. Classification experiments and perceptual studies were performed to validate the methods and models developed in this work. To achieve the goals described above, a 4D database of natural, synced, dyadic conversations was captured. This database is the first of its kind in the world. Another contribution of this thesis is the development of a novel method for modelling conversational interactions. Our approach takes into account the time-sequential nature of the interactions, and encompasses the characteristics of each expression in an interaction, as well as information about the interaction itself. Classification experiments were performed to evaluate the quality of our tracking, inter-subject registration, and modelling methods. To evaluate our ability to model, manipulate, and synthesise new expression sequences, we conducted perceptual experiments. For these perceptual studies, we manipulated modelled sequences by modifying their amplitudes, and had human observers evaluate the level of expression realism and image quality. To evaluate our coupled modelling approach for conversational facial expression interactions, we performed a classification experiment that differentiated predicted frontchannel and backchannel sequences, using the original sequences in the training set. We also used the predicted backchannel sequences in a perceptual study in which human observers rated the level of similarity of the predicted and original sequences. The results of these experiments help support our methods and our claim of our ability to produce 4D, highly-realistic expression sequences that compete with state-of-the-art methods
    corecore