15 research outputs found
Modelling personality features by changing prosody in synthetic speech
This study explores how features of brand personalities can be modelled with the prosodic parameters pitch level, pitch range, articulation rate and loudness. Experiments with parametrical diphone synthesis showed that listeners rated the prosodically changed versions better than a baseline version for the dimension
Modelling personality features by changing prosody in synthetic speech
This study explores how features of brand personalities can be modelled with the prosodic parameters pitch level, pitch range, articulation rate and loudness. Experiments with parametrical diphone synthesis showed that listeners rated the prosodically changed versions better than a baseline version for the dimension
Fully generated scripted dialogue for embodied agents
This paper presents the NECA approach to the generation of dialogues between Embodied Conversational Agents (ECAs). This approach consist of the automated construction of an abstract script for an entire dialogue (cast in terms of dialogue acts), which is incrementally enhanced by a series of modules and finally ''performed'' by means of text, speech and body language, by a cast of ECAs. The approach makes it possible to automatically produce a large variety of highly expressive dialogues, some of whose essential properties are under the control of a user. The paper discusses the advantages and disadvantages of NECA's approach to Fully Generated Scripted Dialogue (FGSD), and explains the main techniques used in the two demonstrators that were built. The paper can be read as a survey of issues and techniques in the construction of ECAs, focusing on the generation of behaviour (i.e., focusing on information presentation) rather than on interpretation
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era
Speech is the fundamental mode of human communication, and its synthesis has
long been a core priority in human-computer interaction research. In recent
years, machines have managed to master the art of generating speech that is
understandable by humans. But the linguistic content of an utterance
encompasses only a part of its meaning. Affect, or expressivity, has the
capacity to turn speech into a medium capable of conveying intimate thoughts,
feelings, and emotions -- aspects that are essential for engaging and
naturalistic interpersonal communication. While the goal of imparting
expressivity to synthesised utterances has so far remained elusive, following
recent advances in text-to-speech synthesis, a paradigm shift is well under way
in the fields of affective speech synthesis and conversion as well. Deep
learning, as the technology which underlies most of the recent advances in
artificial intelligence, is spearheading these efforts. In the present
overview, we outline ongoing trends and summarise state-of-the-art approaches
in an attempt to provide a comprehensive overview of this exciting field.Comment: Submitted to the Proceedings of IEE
Modeling Reader's Emotional State Response on Document's Typographic Elements
We present the results of an experimental study towards modeling the reader's emotional state variations induced by the typographic elements in electronic documents. Based on the dimensional theory of emotions we investigate how typographic elements, like font style (bold, italics, bold-italics) and font (type, size, color and background color), affect the reader's emotional states, namely, Pleasure, Arousal, and Dominance (PAD). An experimental procedure was implemented conforming to International Affective Picture System guidelines and incorporating the Self-Assessment Manikin test. Thirty students participated in the experiment. The stimulus was a short paragraph of text for which any content, emotion, and/or domain dependent information was excluded. The Analysis of Variance revealed the dependency of (a) all the three emotional dimensions on font size and font/background color combinations and (b) the Pleasure dimension on font type and font style. We introduce a set of mapping rules showing how PAD vary on the discrete values of font style and font type elements. Moreover, we introduce a set of equations describing the PAD dimensions' dependency on font size. This novel model can contribute to the automated reader's emotional state extraction in order, for example, to enhance the acoustic rendition of the documents, utilizing text-to-speech synthesis