658 research outputs found
Sound-Action Symbolism
Recent evidence has shown linkages between actions and segmental elements of speech. For instance, close-front vowels are sound symbolically associated with the precision grip, and front vowels are associated with forward-directed limb movements. The current review article presents a variety of such sound-action effects and proposes that they compose a category of sound symbolism that is based on grounding a conceptual knowledge of a referent in articulatory and manual action representations. In addition, the article proposes that even some widely known sound symbolism phenomena such as the sound-magnitude symbolism can be partially based on similar sensorimotor grounding. It is also discussed that meaning of suprasegmental speech elements in many instances is similarly grounded in body actions. Sound symbolism, prosody, and body gestures might originate from the same embodied mechanisms that enable a vivid and iconic expression of a meaning of a referent to the recipient.Peer reviewe
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era
Speech is the fundamental mode of human communication, and its synthesis has
long been a core priority in human-computer interaction research. In recent
years, machines have managed to master the art of generating speech that is
understandable by humans. But the linguistic content of an utterance
encompasses only a part of its meaning. Affect, or expressivity, has the
capacity to turn speech into a medium capable of conveying intimate thoughts,
feelings, and emotions -- aspects that are essential for engaging and
naturalistic interpersonal communication. While the goal of imparting
expressivity to synthesised utterances has so far remained elusive, following
recent advances in text-to-speech synthesis, a paradigm shift is well under way
in the fields of affective speech synthesis and conversion as well. Deep
learning, as the technology which underlies most of the recent advances in
artificial intelligence, is spearheading these efforts. In the present
overview, we outline ongoing trends and summarise state-of-the-art approaches
in an attempt to provide a comprehensive overview of this exciting field.Comment: Submitted to the Proceedings of IEE
Cognitive behavioural systems
This book constitutes refereed proceedings of the COST 2102 International Training School on Cognitive Behavioural Systems held in Dresden, Germany, in February 2011. The 39 revised full papers presented were carefully reviewed and selected from various submissions. The volume presents new and original research results in the field of human-machine interaction inspired by cognitive behavioural human-human interaction features. The themes covered are on cognitive and computational social information processing, emotional and social believable Human-Computer Interaction (HCI) systems, behavioural and contextual analysis of interaction, embodiment, perception, linguistics, semantics and sentiment analysis in dialogues and interactions, algorithmic and computational issues for the automatic recognition and synthesis of emotional states
I Probe, Therefore I Am: Designing a Virtual Journalist with Human Emotions
By utilizing different communication channels, such as verbal language,
gestures or facial expressions, virtually embodied interactive humans hold a
unique potential to bridge the gap between human-computer interaction and
actual interhuman communication. The use of virtual humans is consequently
becoming increasingly popular in a wide range of areas where such a natural
communication might be beneficial, including entertainment, education, mental
health research and beyond. Behind this development lies a series of
technological advances in a multitude of disciplines, most notably natural
language processing, computer vision, and speech synthesis. In this paper we
discuss a Virtual Human Journalist, a project employing a number of novel
solutions from these disciplines with the goal to demonstrate their viability
by producing a humanoid conversational agent capable of naturally eliciting and
reacting to information from a human user. A set of qualitative and
quantitative evaluation sessions demonstrated the technical feasibility of the
system whilst uncovering a number of deficits in its capacity to engage users
in a way that would be perceived as natural and emotionally engaging. We argue
that naturalness should not always be seen as a desirable goal and suggest that
deliberately suppressing the naturalness of virtual human interactions, such as
by altering its personality cues, might in some cases yield more desirable
results.Comment: eNTERFACE16 proceeding
Rhythm from the linguistic and neurobiological perspectives
This article is about rhythm from the linguistic and neurobiological perspectives. Comparing data from both domains could enable clarification of (long–term) confusion in acoustic and conceptual ideas of rhythm as well as providing a multidimensional definition of this category with solid neurobiological grounding and taking aspects of modeling speech production into consideration.This article is about rhythm from the linguistic and neurobiological perspectives. Comparing data from both domains could enable clarification of (long–term) confusion in acoustic and conceptual ideas of rhythm as well as providing a multidimensional definition of this category with solid neurobiological grounding and taking aspects of modeling speech production into consideration
Identification of persons via voice imprint
Tato práce se zabývá textově závislým rozpoznáváním řečníků v systémech, kde existuje pouze omezené množství trénovacích vzorků. Pro účel rozpoznávání je navržen otisk hlasu založený na různých příznacích (např. MFCC, PLP, ACW atd.). Na začátku práce je zmíněn způsob vytváření řečového signálu. Některé charakteristiky řeči, důležité pro rozpoznávání řečníků, jsou rovněž zmíněny. Další část práce se zabývá analýzou řečového signálu. Je zde zmíněno předzpracování a také metody extrakce příznaků. Následující část popisuje proces rozpoznávání řečníků a zmiňuje způsoby ohodnocení používaných metod: identifikace a verifikace řečníků. Poslední teoreticky založená část práce se zabývá klasifikátory vhodnými pro textově závislé rozpoznávání. Jsou zmíněny klasifikátory založené na zlomkových vzdálenostech, dynamickém borcení časové osy, vyrovnávání rozptylu a vektorové kvantizaci. Tato práce pokračuje návrhem a realizací systému, který hodnotí všechny zmíněné klasifikátory pro otisk hlasu založený na různých příznacích.This work deals with the text-dependent speaker recognition in systems, where just a few training samples exist. For the purpose of this recognition, the voice imprint based on different features (e.g. MFCC, PLP, ACW etc.) is proposed. At the beginning, there is described the way, how the speech signal is produced. Some speech characteristics important for speaker recognition are also mentioned. The next part of work deals with the speech signal analysis. There is mentioned the preprocessing and also the feature extraction methods. The following part describes the process of speaker recognition and mentions the evaluation of the used methods: speaker identification and verification. Last theoretically based part of work deals with the classifiers which are suitable for the text-dependent recognition. The classifiers based on fractional distances, dynamic time warping, dispersion matching and vector quantization are mentioned. This work continues by design and realization of system, which evaluates all described classifiers for voice imprint based on different features.
A survey on perceived speaker traits: personality, likability, pathology, and the first challenge
The INTERSPEECH 2012 Speaker Trait Challenge aimed at a unified test-bed for perceived speaker traits – the first challenge of this kind: personality in the five OCEAN personality dimensions, likability of speakers, and intelligibility of pathologic speakers. In the present article, we give a brief overview of the state-of-the-art in these three fields of research and describe the three sub-challenges in terms of the challenge conditions, the baseline results provided by the organisers, and a new openSMILE feature set, which has been used for computing the baselines and which has been provided to the participants. Furthermore, we summarise the approaches and the results presented by the participants to show the various techniques that are currently applied to solve these classification tasks
Models and analysis of vocal emissions for biomedical applications
This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies
- …