168,126 research outputs found

    Perceptual Evaluation of Video-Realistic Speech

    Get PDF
    abstract With many visual speech animation techniques now available, there is a clear need for systematic perceptual evaluation schemes. We describe here our scheme and its application to a new video-realistic (potentially indistinguishable from real recorded video) visual-speech animation system, called Mary 101. Two types of experiments were performed: a) distinguishing visually between real and synthetic image- sequences of the same utterances, ("Turing tests") and b) gauging visual speech recognition by comparing lip-reading performance of the real and synthetic image-sequences of the same utterances ("Intelligibility tests"). Subjects that were presented randomly with either real or synthetic image-sequences could not tell the synthetic from the real sequences above chance level. The same subjects when asked to lip-read the utterances from the same image-sequences recognized speech from real image-sequences significantly better than from synthetic ones. However, performance for both, real and synthetic, were at levels suggested in the literature on lip-reading. We conclude from the two experiments that the animation of Mary 101 is adequate for providing a percept of a talking head. However, additional effort is required to improve the animation for lip-reading purposes like rehabilitation and language learning. In addition, these two tasks could be considered as explicit and implicit perceptual discrimination tasks. In the explicit task (a), each stimulus is classified directly as a synthetic or real image-sequence by detecting a possible difference between the synthetic and the real image-sequences. The implicit perceptual discrimination task (b) consists of a comparison between visual recognition of speech of real and synthetic image-sequences. Our results suggest that implicit perceptual discrimination is a more sensitive method for discrimination between synthetic and real image-sequences than explicit perceptual discrimination

    Advances in Emotion Recognition: Link to Depressive Disorder

    Get PDF
    Emotion recognition enables real-time analysis, tagging, and inference of cognitive affective states from human facial expression, speech and tone, body posture and physiological signal, as well as social text on social network platform. Recognition of emotion pattern based on explicit and implicit features extracted through wearable and other devices could be decoded through computational modeling. Meanwhile, emotion recognition and computation are critical to detection and diagnosis of potential patients of mood disorder. The chapter aims to summarize the main findings in the area of affective recognition and its applications in major depressive disorder (MDD), which have made rapid progress in the last decade

    Beat gestures influence which speech sounds you hear

    Get PDF
    Beat gestures - spontaneously produced biphasic movements of the hand - are among the most frequently encountered co-speech gestures in human communication. They are closely temporally aligned to the prosodic characteristics of the speech signal, typically occurring on lexically stressed syllables. Despite their prevalence across speakers of the world's languages, how beat gestures impact spoken word recognition is unclear. Can these simple 'flicks of the hand' influence speech perception? Across six experiments, we demonstrate that beat gestures influence the explicit and implicit perception of lexical stress (e.g., distinguishing OBject from obJECT), and in turn, can influence what vowels listeners hear. Thus, we provide converging evidence for a manual McGurk effect: even the simplest 'flicks of the hands' influence which speech sounds we hear

    Sector-Based Detection for Hands-Free Speech Enhancement in Cars

    Get PDF
    Speech-based command interfaces are becoming more and more common in cars. Applications include automatic dialog systems for hands-free phone calls as well as more advanced features such as navigation systems. However, interferences, such as speech from the codriver, can hamper a lot the performance of the speech recognition component, which is crucial for those applications. This issue can be addressed with {\em adaptive} interference cancellation techniques such as the Generalized Sidelobe Canceller~(GSC). In order to cancel the interference (codriver) while not cancelling the target (driver), adaptation must happen only when the interference is active and dominant. To that purpose, this paper proposes two efficient adaptation control methods called ``implicit'' and ``explicit''. While the ``implicit'' method is fully automatic, the ``explicit'' method relies on pre-estimation of target and interference energies. A major contribution of this paper is a direct, robust method for such pre-estimation, directly derived from sector-based detection and localization techniques. Experiments on real in-car data validate both adaptation methods, including a case with 100 km/h background road noise

    Problem spotting in human-machine interaction

    Get PDF
    In human-human communication, dialogue participants are con-tinuously sending and receiving signals on the status of the inform-ation being exchanged. We claim that if spoken dialogue systems were able to detect such cues and change their strategy accordingly, the interaction between user and systemwould improve. Therefore, the goals of the present study are as follows: (i) to find out which positive and negative cues people actually use in human-machine interaction in response to explicit and implicit verification questions and (ii) to see which (combinations of) cues have the best predictive potential for spotting the presence or absence of problems. It was found that subjects systematically use negative/marked cues (more words, marked word order, more repetitions and corrections, less new information etc.) when there are communication problems. Using precision and recall matrices it was found that various combinations of cues are accurate problem spotters. This kind of information may turn out to be highly relevant for spoken dia-logue systems, e.g., by providing quantitative criteria for changing the dialogue strategy or speech recognition engine

    The management of context-sensitive features: A review of strategies

    Get PDF
    In this paper, we review five heuristic strategies for handling context- sensitive features in supervised machine learning from examples. We discuss two methods for recovering lost (implicit) contextual information. We mention some evidence that hybrid strategies can have a synergetic effect. We then show how the work of several machine learning researchers fits into this framework. While we do not claim that these strategies exhaust the possibilities, it appears that the framework includes all of the techniques that can be found in the published literature on context-sensitive learning
    corecore