82 research outputs found

    Audio-to-Visual Speech Conversion using Deep Neural Networks

    Get PDF
    We study the problem of mapping from acoustic to visual speech with the goal of generating accurate, perceptually natural speech animation automatically from an audio speech signal. We present a sliding window deep neural network that learns a mapping from a window of acoustic features to a window of visual features from a large audio-visual speech dataset. Overlapping visual predictions are averaged to generate continuous, smoothly varying speech animation. We outperform a baseline HMM inversion approach in both objective and subjective evaluations and perform a thorough analysis of our results

    Audio/visual mapping with cross-modal hidden Markov models

    Full text link

    Animation of generic 3D Head models driven by speech

    Get PDF
    International audienceIn this paper, a system for speech-driven animation of generic 3D head models is presented. The system is based on the inversion of a joint Audio-Visual Hidden Markov Model to estimate the visual information from speech data. Estimated visual speech features are used to animate a simple face model. The animation of a more complex head model is then obtained by automatically mapping the deformation of the simple model to it. The proposed algorithm allows the animation of 3D head models of arbitrary complexity through a simple setup procedure. The resulting animation is evaluated in terms of intelligibility of visual speech through subjective tests, showing a promising performance

    Face Active Appearance Modeling and Speech Acoustic Information to Recover Articulation

    Full text link

    Speech-driven animation using multi-modal hidden Markov models

    Get PDF
    The main objective of this thesis was the synthesis of speech synchronised motion, in particular head motion. The hypothesis that head motion can be estimated from the speech signal was confirmed. In order to achieve satisfactory results, a motion capture data base was recorded, a definition of head motion in terms of articulation was discovered, a continuous stream mapping procedure was developed, and finally the synthesis was evaluated. Based on previous research into non-verbal behaviour basic types of head motion were invented that could function as modelling units. The stream mapping method investigated in this thesis is based on Hidden Markov Models (HMMs), which employ modelling units to map between continuous signals. The objective evaluation of the modelling parameters confirmed that head motion types could be predicted from the speech signal with an accuracy above chance, close to 70%. Furthermore, a special type ofHMMcalled trajectoryHMMwas used because it enables synthesis of continuous output. However head motion is a stochastic process therefore the trajectory HMM was further extended to allow for non-deterministic output. Finally the resulting head motion synthesis was perceptually evaluated. The effects of the “uncanny valley” were also considered in the evaluation, confirming that rendering quality has an influence on our judgement of movement of virtual characters. In conclusion a general method for synthesising speech-synchronised behaviour was invented that can applied to a whole range of behaviours

    Affective Computing

    Get PDF
    This book provides an overview of state of the art research in Affective Computing. It presents new ideas, original results and practical experiences in this increasingly important research field. The book consists of 23 chapters categorized into four sections. Since one of the most important means of human communication is facial expression, the first section of this book (Chapters 1 to 7) presents a research on synthesis and recognition of facial expressions. Given that we not only use the face but also body movements to express ourselves, in the second section (Chapters 8 to 11) we present a research on perception and generation of emotional expressions by using full-body motions. The third section of the book (Chapters 12 to 16) presents computational models on emotion, as well as findings from neuroscience research. In the last section of the book (Chapters 17 to 22) we present applications related to affective computing

    Realistic and expressive talking head : implementation and evaluation

    Get PDF
    [no abstract

    Adaptive threshold optimisation for colour-based lip segmentation in automatic lip-reading systems

    Get PDF
    A thesis submitted to the Faculty of Engineering and the Built Environment, University of the Witwatersrand, Johannesburg, in ful lment of the requirements for the degree of Doctor of Philosophy. Johannesburg, September 2016Having survived the ordeal of a laryngectomy, the patient must come to terms with the resulting loss of speech. With recent advances in portable computing power, automatic lip-reading (ALR) may become a viable approach to voice restoration. This thesis addresses the image processing aspect of ALR, and focuses three contributions to colour-based lip segmentation. The rst contribution concerns the colour transform to enhance the contrast between the lips and skin. This thesis presents the most comprehensive study to date by measuring the overlap between lip and skin histograms for 33 di erent colour transforms. The hue component of HSV obtains the lowest overlap of 6:15%, and results show that selecting the correct transform can increase the segmentation accuracy by up to three times. The second contribution is the development of a new lip segmentation algorithm that utilises the best colour transforms from the comparative study. The algorithm is tested on 895 images and achieves percentage overlap (OL) of 92:23% and segmentation error (SE) of 7:39 %. The third contribution focuses on the impact of the histogram threshold on the segmentation accuracy, and introduces a novel technique called Adaptive Threshold Optimisation (ATO) to select a better threshold value. The rst stage of ATO incorporates -SVR to train the lip shape model. ATO then uses feedback of shape information to validate and optimise the threshold. After applying ATO, the SE decreases from 7:65% to 6:50%, corresponding to an absolute improvement of 1:15 pp or relative improvement of 15:1%. While this thesis concerns lip segmentation in particular, ATO is a threshold selection technique that can be used in various segmentation applications.MT201
    • 

    corecore