124 research outputs found

    Investigating 3D Visual Speech Animation Using 2D Videos

    Get PDF
    Lip motion accuracy is of paramount importance for speech intelligibility, especially for users who are hard of hearing or foreign language learners. Furthermore, generating a high level of realism in lip movements is required for the game and film production industries. This thesis focuses on the mapping of tracked lip motions of front-view 2D videos of a real speaker to a synthetic 3D head. A data-driven approach is used based on a 3D morphable model (3DMM) built using 3D synthetic head poses. The 3DMMs have been widely used for different tasks such as face recognition, detect facial expressions and lip motions in 2D videos. However, investigating factors such as the required facial landmarks for the mapping process, the amount of data for constructing the 3DMM, and differences in facial features between real faces and 3D faces that may influence the resulting animation have not been considered yet. Therefore, this research centers around investigating the impact of these factors on the final 3D lip motions. The thesis explores how different sets of facial features used in the mapping process influence the resulting 3D motions. Five sets of the facial features are used for mapping the real faces to the corresponding 3D faces. The results show that the inclusion of eyebrows, eyes, nose, and lips improves the 3D lip motions, while face contour features (i.e. the outside boundary of the front view of the face) restrict the face’s mesh, distorting the resulting animation. This thesis investigates how using different amounts of data when constructing the 3DMM affects the 3D lip motions. The results show that using a wider range of synthetic head poses for different phoneme intensities to create a 3DMM, as well as a combination of front- and side-view photographs of real speakers to produce initial neutral 3D synthetic head poses, provides better animation results compared to ground truth data consisting of front- and side-view 2D videos of real speakers. The thesis also investigates the impact of differences and similarities in facial features between real speakers and the 3DMMs on the resulting 3D lip motions by mapping between non-similar faces based on differences and similarities in vertical mouth height and mouth width. The objective and user test results show that mapping 2D videos of real speakers with low vertical mouth heights to 3D heads that correspond to real speakers with high vertical mouth heights, or vice versa, generates less good 3D lip motions. It is thus important that this is considered when using a 2D recording of a real actor’s lip movements to control a 3D synthetic character

    Development and Evaluation of Tongue Operated Robotic Rehabilitation Paradigm for Stroke Survivors with Upper Limb Paralysis

    Get PDF
    Stroke is a devastating condition that may cause upper limb paralysis. Robotic rehabilitation with self-initiated and assisted movements is a promising technology that could help restore upper limb function. The objective of this research is to develop and evaluate a tongue-operated exoskeleton that will harness the intention of stroke survivors with upper limb paralysis via tongue motion to control robotic exoskeleton during rehabilitation to achieve functional restoration and improve quality of life. Specifically, a tongue operated assistive technology called the Tongue Drive System is used to harness the tongue gesture to generate commands. And, the generated command is used to control rehabilitation robot such as wrist-based exoskeleton Hand Mentor ProTM (HM) and upper limb-based exoskeleton KINARMTM. Through a pilot experiment with 3 healthy participants, we have demonstrated the functionality of an enhanced TDS-HM with pressure-sensing capability. The system can add a programmable load force to increase the exercise intensity in isotonic mode. Through experiments with healthy and stroke subjects, we have demonstrated that the TDS-KINARM system could accurately translate tongue commands to exoskeleton arm movements, quantify function of the upper limb and perform rehabilitation training. Specifically, all healthy subjects and stroke survivors successfully performed target reaching and tracking tasks in all control modes. One of the stroke patients showed clinically significant improvement. We also analyzed the arm reaching kinematics of healthy subjects in 4 modes (active, active viscous, discrete tongue, and proportional tongue) of TDS-KINARM operation. The results indicated that the proportional tongue mode was a better candidate than the discrete tongue mode for the tongue assisted rehabilitation. This study also provided initial insights into possible kinematic similarities between tongue-operated and voluntary arm movements. Furthermore, the results showed that the viscous resistance to arm motion did not affect kinematics of arm reaching movements significantly. Finally, through a 6 healthy subject experiment, we observed a tendency of a facilitatory effect of adding tongue movement to limb movement on event-related desynchronization in EEG, implying enhanced brain excitability. This effect may contribute to enhanced rehabilitation outcome in stroke survivors using TDS with motor rehabilitation.Ph.D

    : How compensation mechanisms can inform us about phonemic targets

    Get PDF
    International audienceThe present study describes the results of a 2 week perturbation experiment where speakers' vocal tract shape was modified due to the presence of an artificial palate. The aim of the work is to investigate whether speakers adapt towards acoustic or articulatory targets. Speakers were recorded regularly over the adaptation time via electromagnetic articulography and acoustics. Immediately after perturbation onset speakers' auditory feedback was masked with white noise in order to investigate speakers' compensatory behaviour when auditory feedback was absent. The results of acoustic measurements show that in vowel production speakers compensate very soon. The compensation in fricatives takes longer and is in some cases not completed within the two weeks. Within a session and for each speaker the sounds can be distinguished solely by acoustic parameters. The difference between the session when no auditory feedback was available and the session when auditory feedback was available was greater for vowels with less palatal contact than for vowels with much palatal contact. In consonant production auditory feedback is primarily used in order to adapt sibilant productions. In general, adaptation tries to keep or enlarge the articulatory and acoustic space between the sounds. Over sessions speakers show motor equivalent strategies (lip protrusion vs. tongue back raising) in the production of /u/. Measurements of tangential jerk suggest that after perturbation onset there is an increase in articulatory effort which is followed by a decrease towards the end of the adaptation time. The compensatory abilities of speakers when no auditory feedback is available suggest that speakers dispose of an articulatory representation. The fact that motor equivalent strategies are used by the speakers, however, supports acoustic representations of speech. It is therefore concluded that articulatory representations belong to the speech production tasks. However, since they are modified as soon as the acoustic output is not the desired one any more, they rather function in the domain of movement organisation and the acoustic representations dominate

    Origins of Human Language

    Get PDF
    This book proposes a detailed picture of the continuities and ruptures between communication in primates and language in humans. It explores a diversity of perspectives on the origins of language, including a fine description of vocal communication in animals, mainly in monkeys and apes, but also in birds, the study of vocal tract anatomy and cortical control of the vocal productions in monkeys and apes, the description of combinatory structures and their social and communicative value, and the exploration of the cognitive environment in which language may have emerged from nonhuman primate vocal or gestural communication

    QUANTITATIVE ANALYSIS OF THE SPEECH AND LIP MOVEMENTS THROUGH OPTOELECTRONIC MOTION ANALYSIS AND SURFACE ELETROMYOGRAPHY

    Get PDF
    Functional impairments of facial movements alter the quality of life, and their quantitative analysis is a key step in the description and grading of facial function and dysfunction. In this investigation we assessed the symmetry of lip movements in verbal and non-verbal movements in healthy subjects. A non-invasive recording protocol, integrating an electromyographic system and an optoelectronic 3D-motion analyzer, has been developed and used to detect lip movements in verbal and non-verbal movements. Two separate investigations have been made. In the first study, functional symmetries of the lip movements were assessed in a control group of clinically healthy subjects. Data were evaluated separately for men and women, and a gender-related effect was tested. The aim of the second study was to assess the onset of the EMG activity of zygomaticus and depressor labii inferioris muscles that play a role in speech pronunciation and smiling movements. The outcomes suggest that the proposed method could be a useful tool to evaluate the asymmetry of the lips and of the facial muscles during the performance of smiling, lip purse and speech pronunciation, and to detect functionally altered facial conditions

    Phonetics of Maltese: some areas relevant to the deaf

    Get PDF

    Eye and mouth openness estimation in sign language and news broadcast videos

    Get PDF
    Currently there exists an increasing need of automatic video analysis tools to support sign language studies and the evaluation of the activity of the face in sign language and other videos. Henceforth, research focusing on automatic estimation and annotation of videos and facial gestures is continuously developing. In this work, techniques for the estimation of eye and mouth openness and eyebrow position are studied. Such estimation could prove beneficial for automatic annotation and quantitative evaluation of sign language videos as well as towards more prolific production of sign language material. The method proposed for the estimation of the eyebrow position, eye openness, and mouth state is based on the construction of a set of facial landmarks that employ different detection techniques designed for each facial element. Furthermore, we compare the presented landmark detection algorithm with a recently published third-party face alignment algorithm. The landmarks are used to compute features which describe the geometric information of the elements of the face. The features constitute the input for the classifiers that can produce quantized openness estimates for the studied facial elements. Finally, the estimation performance of the estimations is evaluated in quantitative and qualitative experiments with sign language and news broadcast videos

    Feature-based pronunciation modeling for automatic speech recognition

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (p. 131-140).Spoken language, especially conversational speech, is characterized by great variability in word pronunciation, including many variants that differ grossly from dictionary prototypes. This is one factor in the poor performance of automatic speech recognizers on conversational speech. One approach to handling this variation consists of expanding the dictionary with phonetic substitution, insertion, and deletion rules. Common rule sets, however, typically leave many pronunciation variants unaccounted for and increase word confusability due to the coarse granularity of phone units. We present an alternative approach, in which many types of variation are explained by representing a pronunciation as multiple streams of linguistic features rather than a single stream of phones. Features may correspond to the positions of the speech articulators, such as the lips and tongue, or to acoustic or perceptual categories. By allowing for asynchrony between features and per-feature substitutions, many pronunciation changes that are difficult to account for with phone-based models become quite natural. Although it is well-known that many phenomena can be attributed to this "semi-independent evolution" of features, previous models of pronunciation variation have typically not taken advantage of this. In particular, we propose a class of feature-based pronunciation models represented as dynamic Bayesian networks (DBNs).(cont.) The DBN framework allows us to naturally represent the factorization of the state space of feature combinations into feature-specific factors, as well as providing standard algorithms for inference and parameter learning. We investigate the behavior of such a model in isolation using manually transcribed words. Compared to a phone-based baseline, the feature-based model has both higher coverage of observed pronunciations and higher recognition rate for isolated words. We also discuss the ways in which such a model can be incorporated into various types of end-to-end speech recognizers and present several examples of implemented systems, for both acoustic speech recognition and lipreading tasks.by Karen Livescu.Ph.D

    Intersubjectivity, Empathy and Nonverbal Interaction

    Get PDF
    PhDEmpathy is thought to involve cognitive processes that depend on the simulation of another's experiences. Embodiment has a key role for empathy as vehicle for recreating the experience of another. This thesis explores the validity of this claim by investigating what people do when communicating about their experiences. In particular, what is the contribution of our embodied resources such as gestures, postures and expressions to empathy and intersubjectivity? These questions are explored against two corpora of dyadic interactions. One features conversations of people describing recalled embodied experiences to each other, such as painful or pleasant bodily experiences like a headache or laughing. The other features a series of interactions designed to emulate informal conversations. The analysis uses hand coded gestures, feedback and clari cation questions, body movement data and a new approach to quantifying posture congruence. The analysis shows the embodied responses observed within these interactions are intentionally placed and formulated to facilitate the incremental process of a conversation as a joint activity. This is inconsistent with accounts that propose there is an automatic and non-conscious propensity for people to mimic each other in social interactions. Quantitative analysis show that patterns of gesture type and use, feedback form and posture di er systematically between interlocutors. Additionally, results show that resources provided by embodiment are allocated strategically. Nonverbal contributions increase in frequency and adjust their form responding to problems in conversation such as during clari cation questions and repair. Detailed qualitative analysis shows the instances that appear to display mimicry within the interaction function rather as embodied adaptations or paraphrases. In their contrast with the original contribution they demonstrate a speci c understanding of the type of experience being conveyed. This work shows that embodiment is an important resource for intersubjectivity and embodied communication is speci cally constructed to aid the collaborative, sequential and intersubjective progression of dialogue.Media and Arts Technology programme, EPSRC Doctoral Training Centre EP/G03723X/
    • …
    corecore