658 research outputs found

    Sound-Action Symbolism

    Get PDF
    Recent evidence has shown linkages between actions and segmental elements of speech. For instance, close-front vowels are sound symbolically associated with the precision grip, and front vowels are associated with forward-directed limb movements. The current review article presents a variety of such sound-action effects and proposes that they compose a category of sound symbolism that is based on grounding a conceptual knowledge of a referent in articulatory and manual action representations. In addition, the article proposes that even some widely known sound symbolism phenomena such as the sound-magnitude symbolism can be partially based on similar sensorimotor grounding. It is also discussed that meaning of suprasegmental speech elements in many instances is similarly grounded in body actions. Sound symbolism, prosody, and body gestures might originate from the same embodied mechanisms that enable a vivid and iconic expression of a meaning of a referent to the recipient.Peer reviewe

    An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era

    Get PDF
    Speech is the fundamental mode of human communication, and its synthesis has long been a core priority in human-computer interaction research. In recent years, machines have managed to master the art of generating speech that is understandable by humans. But the linguistic content of an utterance encompasses only a part of its meaning. Affect, or expressivity, has the capacity to turn speech into a medium capable of conveying intimate thoughts, feelings, and emotions -- aspects that are essential for engaging and naturalistic interpersonal communication. While the goal of imparting expressivity to synthesised utterances has so far remained elusive, following recent advances in text-to-speech synthesis, a paradigm shift is well under way in the fields of affective speech synthesis and conversion as well. Deep learning, as the technology which underlies most of the recent advances in artificial intelligence, is spearheading these efforts. In the present overview, we outline ongoing trends and summarise state-of-the-art approaches in an attempt to provide a comprehensive overview of this exciting field.Comment: Submitted to the Proceedings of IEE

    Cognitive behavioural systems

    Get PDF
    This book constitutes refereed proceedings of the COST 2102 International Training School on Cognitive Behavioural Systems held in Dresden, Germany, in February 2011. The 39 revised full papers presented were carefully reviewed and selected from various submissions. The volume presents new and original research results in the field of human-machine interaction inspired by cognitive behavioural human-human interaction features. The themes covered are on cognitive and computational social information processing, emotional and social believable Human-Computer Interaction (HCI) systems, behavioural and contextual analysis of interaction, embodiment, perception, linguistics, semantics and sentiment analysis in dialogues and interactions, algorithmic and computational issues for the automatic recognition and synthesis of emotional states

    I Probe, Therefore I Am: Designing a Virtual Journalist with Human Emotions

    Get PDF
    By utilizing different communication channels, such as verbal language, gestures or facial expressions, virtually embodied interactive humans hold a unique potential to bridge the gap between human-computer interaction and actual interhuman communication. The use of virtual humans is consequently becoming increasingly popular in a wide range of areas where such a natural communication might be beneficial, including entertainment, education, mental health research and beyond. Behind this development lies a series of technological advances in a multitude of disciplines, most notably natural language processing, computer vision, and speech synthesis. In this paper we discuss a Virtual Human Journalist, a project employing a number of novel solutions from these disciplines with the goal to demonstrate their viability by producing a humanoid conversational agent capable of naturally eliciting and reacting to information from a human user. A set of qualitative and quantitative evaluation sessions demonstrated the technical feasibility of the system whilst uncovering a number of deficits in its capacity to engage users in a way that would be perceived as natural and emotionally engaging. We argue that naturalness should not always be seen as a desirable goal and suggest that deliberately suppressing the naturalness of virtual human interactions, such as by altering its personality cues, might in some cases yield more desirable results.Comment: eNTERFACE16 proceeding

    Rhythm from the linguistic and neurobiological perspectives

    Get PDF
    This article is about rhythm from the linguistic and neurobiological perspectives. Comparing data from both domains could enable clarification of (long–term) confusion in acoustic and conceptual ideas of rhythm as well as providing a multidimensional definition of this category with solid neurobiological grounding and taking aspects of modeling speech production into consideration.This article is about rhythm from the linguistic and neurobiological perspectives. Comparing data from both domains could enable clarification of (long–term) confusion in acoustic and conceptual ideas of rhythm as well as providing a multidimensional definition of this category with solid neurobiological grounding and taking aspects of modeling speech production into consideration

    Identification of persons via voice imprint

    Get PDF
    Tato práce se zabývá textově závislým rozpoznáváním řečníků v systémech, kde existuje pouze omezené množství trénovacích vzorků. Pro účel rozpoznávání je navržen otisk hlasu založený na různých příznacích (např. MFCC, PLP, ACW atd.). Na začátku práce je zmíněn způsob vytváření řečového signálu. Některé charakteristiky řeči, důležité pro rozpoznávání řečníků, jsou rovněž zmíněny. Další část práce se zabývá analýzou řečového signálu. Je zde zmíněno předzpracování a také metody extrakce příznaků. Následující část popisuje proces rozpoznávání řečníků a zmiňuje způsoby ohodnocení používaných metod: identifikace a verifikace řečníků. Poslední teoreticky založená část práce se zabývá klasifikátory vhodnými pro textově závislé rozpoznávání. Jsou zmíněny klasifikátory založené na zlomkových vzdálenostech, dynamickém borcení časové osy, vyrovnávání rozptylu a vektorové kvantizaci. Tato práce pokračuje návrhem a realizací systému, který hodnotí všechny zmíněné klasifikátory pro otisk hlasu založený na různých příznacích.This work deals with the text-dependent speaker recognition in systems, where just a few training samples exist. For the purpose of this recognition, the voice imprint based on different features (e.g. MFCC, PLP, ACW etc.) is proposed. At the beginning, there is described the way, how the speech signal is produced. Some speech characteristics important for speaker recognition are also mentioned. The next part of work deals with the speech signal analysis. There is mentioned the preprocessing and also the feature extraction methods. The following part describes the process of speaker recognition and mentions the evaluation of the used methods: speaker identification and verification. Last theoretically based part of work deals with the classifiers which are suitable for the text-dependent recognition. The classifiers based on fractional distances, dynamic time warping, dispersion matching and vector quantization are mentioned. This work continues by design and realization of system, which evaluates all described classifiers for voice imprint based on different features.

    A survey on perceived speaker traits: personality, likability, pathology, and the first challenge

    Get PDF
    The INTERSPEECH 2012 Speaker Trait Challenge aimed at a unified test-bed for perceived speaker traits – the first challenge of this kind: personality in the five OCEAN personality dimensions, likability of speakers, and intelligibility of pathologic speakers. In the present article, we give a brief overview of the state-of-the-art in these three fields of research and describe the three sub-challenges in terms of the challenge conditions, the baseline results provided by the organisers, and a new openSMILE feature set, which has been used for computing the baselines and which has been provided to the participants. Furthermore, we summarise the approaches and the results presented by the participants to show the various techniques that are currently applied to solve these classification tasks

    Models and analysis of vocal emissions for biomedical applications

    Get PDF
    This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies
    corecore