15 research outputs found

    Modelling personality features by changing prosody in synthetic speech

    Get PDF
    This study explores how features of brand personalities can be modelled with the prosodic parameters pitch level, pitch range, articulation rate and loudness. Experiments with parametrical diphone synthesis showed that listeners rated the prosodically changed versions better than a baseline version for the dimension

    Emotion Recognition via Continuous Mandarin Speech

    Get PDF

    Multimodal Accessibility of Documents

    Get PDF

    Modelling personality features by changing prosody in synthetic speech

    Get PDF
    This study explores how features of brand personalities can be modelled with the prosodic parameters pitch level, pitch range, articulation rate and loudness. Experiments with parametrical diphone synthesis showed that listeners rated the prosodically changed versions better than a baseline version for the dimension

    Fully generated scripted dialogue for embodied agents

    Get PDF
    This paper presents the NECA approach to the generation of dialogues between Embodied Conversational Agents (ECAs). This approach consist of the automated construction of an abstract script for an entire dialogue (cast in terms of dialogue acts), which is incrementally enhanced by a series of modules and finally ''performed'' by means of text, speech and body language, by a cast of ECAs. The approach makes it possible to automatically produce a large variety of highly expressive dialogues, some of whose essential properties are under the control of a user. The paper discusses the advantages and disadvantages of NECA's approach to Fully Generated Scripted Dialogue (FGSD), and explains the main techniques used in the two demonstrators that were built. The paper can be read as a survey of issues and techniques in the construction of ECAs, focusing on the generation of behaviour (i.e., focusing on information presentation) rather than on interpretation

    An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era

    Get PDF
    Speech is the fundamental mode of human communication, and its synthesis has long been a core priority in human-computer interaction research. In recent years, machines have managed to master the art of generating speech that is understandable by humans. But the linguistic content of an utterance encompasses only a part of its meaning. Affect, or expressivity, has the capacity to turn speech into a medium capable of conveying intimate thoughts, feelings, and emotions -- aspects that are essential for engaging and naturalistic interpersonal communication. While the goal of imparting expressivity to synthesised utterances has so far remained elusive, following recent advances in text-to-speech synthesis, a paradigm shift is well under way in the fields of affective speech synthesis and conversion as well. Deep learning, as the technology which underlies most of the recent advances in artificial intelligence, is spearheading these efforts. In the present overview, we outline ongoing trends and summarise state-of-the-art approaches in an attempt to provide a comprehensive overview of this exciting field.Comment: Submitted to the Proceedings of IEE

    Modeling Reader's Emotional State Response on Document's Typographic Elements

    Get PDF
    We present the results of an experimental study towards modeling the reader's emotional state variations induced by the typographic elements in electronic documents. Based on the dimensional theory of emotions we investigate how typographic elements, like font style (bold, italics, bold-italics) and font (type, size, color and background color), affect the reader's emotional states, namely, Pleasure, Arousal, and Dominance (PAD). An experimental procedure was implemented conforming to International Affective Picture System guidelines and incorporating the Self-Assessment Manikin test. Thirty students participated in the experiment. The stimulus was a short paragraph of text for which any content, emotion, and/or domain dependent information was excluded. The Analysis of Variance revealed the dependency of (a) all the three emotional dimensions on font size and font/background color combinations and (b) the Pleasure dimension on font type and font style. We introduce a set of mapping rules showing how PAD vary on the discrete values of font style and font type elements. Moreover, we introduce a set of equations describing the PAD dimensions' dependency on font size. This novel model can contribute to the automated reader's emotional state extraction in order, for example, to enhance the acoustic rendition of the documents, utilizing text-to-speech synthesis
    corecore