Search CORE

367 research outputs found

Emotion Recognition from Speech with Acoustic, Non-Linear and Wavelet-based Features Extracted in Different Acoustic Conditions

Author: Vásquez Correa Juan Camilo
Publication venue: Medellín, Colombia
Publication date: 01/01/2016
Field of study

ABSTRACT: In the last years, there has a great progress in automatic speech recognition. The challenge now it is not only recognize the semantic content in the speech but also the called "paralinguistic" aspects of the speech, including the emotions, and the personality of the speaker. This research work aims in the development of a methodology for the automatic emotion recognition from speech signals in non-controlled noise conditions. For that purpose, different sets of acoustic, non-linear, and wavelet based features are used to characterize emotions in different databases created for such purpose

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblioteca Digital del Sistema de Bibliotecas de la Universidad de Antioquia

Recommended from our members

Emotional recognition in computing

Author: Axelrod Lesley Ann
Publication venue: Brunel University, School of Information Systems, Computing and Mathematics
Publication date: 01/01/2010
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University 8/4/2010.Emotions are fundamental to human lives and decision-making. Understanding and expression of emotional feeling between people forms an intricate web. This complex interactional phenomena, is a hot topic for research, as new techniques such as brain imaging give us insights about how emotions are tied to human functions. Communication of emotions is mixed with communication of other types of information (such as factual details) and emotions can be consciously or unconsciously displayed. Affective computer systems, using sensors for emotion recognition and able to make emotive responses are under development. The increased potential for emotional interaction with products and services, in many domains, is generating much interest. Emotionally enhanced systems have potential to improve human computer interaction and so to improve how systems are used and what they can deliver. They may also have adverse implications such as creating systems capable of emotional manipulation of users. Affective systems are in their infancy and lack human complexity and capability. This makes it difficult to assess whether human interaction with such systems will actually prove beneficial or desirable to users. By using experimental design, a Wizard of Oz methodology and a game that appeared to respond to the user’s emotional signals with human-like capability, I tested user experience and reactions to a system that appeared affective. To assess users’ behaviour, I developed a novel affective behaviour coding system called ‘affectemes’. I found significant gains in user satisfaction and performance when using an affective system. Those believing the system responded to emotional signals blinked more frequently. If the machine failed to respond to their emotional signals, they increased their efforts to convey emotion, which might be an attempt to ‘repair’ the interaction. This work highlights how very complex and difficult it is to design and evaluate affective systems. I identify many issues for future work, including the unconscious nature of emotions and how they are recognised and displayed with affective systems; issues about the power of emotionally interactive systems and their evaluation; and critical ethical issues. These are important considerations for future design of systems that use emotion recognition in computing.EPSRC project grant (R81374/01

Brunel University Research Archive

Models and Analysis of Vocal Emissions for Biomedical Applications

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies

Directory of Open Access Books (DOAB)

Proceedings of the Sixteenth Australasian International Conference on Speech Science and Technology

Author
Publication venue: ASSTA
Publication date: 31/12/2016
Field of study

UCL Discovery

Individual and environment-related acoustic-phonetic strategies for communicating in adverse conditions

Author: Saigusa Julie Sachie
Publication venue: UCL (University College London)
Publication date: 28/09/2021
Field of study

In many situations it is necessary to produce speech in ‘adverse conditions’: that is, conditions that make speech communication difficult. Research has demonstrated that speaker strategies, as described by a range of acoustic-phonetic measures, can vary both at the individual level and according to the environment, and are argued to facilitate communication. There has been debate as to the environmental specificity of these adaptations, and their effectiveness in overcoming communication difficulty. Furthermore, the manner and extent to which adaptation strategies differ between individuals is not yet well understood. This thesis presents three studies that explore the acoustic-phonetic adaptations of speakers in noisy and degraded communication conditions and their relationship with intelligibility. Study 1 investigated the effects of temporally fluctuating maskers on global acoustic-phonetic measures associated with speech in noise (Lombard speech). The results replicated findings of increased power in the modulation spectrum in Lombard speech, but showed little evidence of adaptation to masker fluctuations via the temporal envelope. Study 2 collected a larger corpus of semi-spontaneous communicative speech in noise and other degradations perturbing specific acoustic dimensions. Speakers showed different adaptations across the environments that were likely suited to overcome noise (steady and temporally fluctuating), restricted spectral and pitch information by a noise-excited vocoder, and a sensorineural hearing loss simulation. Analyses of inter-speaker variation in both studies 1 and 2 showed behaviour was highly variable and some strategy combinations were identified. Study 3 investigated the intelligibility of strategies ‘tailored’ to specific environments and the relationship between intelligibility and speaker acoustics, finding a benefit of tailored speech adaptations and discussing the potential roles of speaker flexibility, adaptation level, and intrinsic intelligibility. The overall results are discussed in relation to models of communication in adverse conditions and a model accounting for individual variability in these conditions is proposed

UCL Discovery

EDITORIAL BOARD

Author: Angela Schorr
Anna Spagnolli
Castellón Spain
Clínica Y Psicobiología
Cristian Berrío
Cristina Botella
Luciano Gamberini
Medienpsychologischen Labor
Tratamientos Psicólogicos Valencia
Univeritat Jaume I
Universität Siegen
Valencia Spain
Zapata Pontificia
Publication venue
Publication date
Field of study

CiteSeerX

Building and Designing Expressive Speech Synthesis

Author: Leigh Clark
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2021
Field of study

We know there is something special about speech. Our voices are not just a means of communicating. They also give a deep impression of who we are and what we might know. They can betray our upbringing, our emotional state, our state of health. They can be used to persuade and convince, to calm and to excite. As speech systems enter the social domain they are required to interact, support and mediate our social relationships with 1) each other, 2) with digital information, and, increasingly, 3) with AI-based algorithms and processes. Socially Interactive Agents (SIAs) are at the fore- front of research and innovation in this area. There is an assumption that in the future “spoken language will provide a natural conversational interface between human beings and so-called intelligent systems.” [Moore 2017, p. 283]. A considerable amount of previous research work has tested this assumption with mixed results. However, as pointed out “voice interfaces have become notorious for fostering frustration and failure” [Nass and Brave 2005, p.6]. It is within this context, between our exceptional and intelligent human use of speech to communicate and interact with other humans, and our desire to leverage this means of communication for artificial systems, that the technology, often termed expressive speech synthesis uncomfortably falls. Uncomfortably, because it is often overshadowed by issues in interactivity and the underlying intelligence of the system which is something that emerges from the interaction of many of the components in a SIA. This is especially true of what we might term conversational speech, where decoupling how things are spoken, from when and to whom they are spoken, can seem an impossible task. This is an even greater challenge in evaluation and in characterising full systems which have made use of expressive speech. Furthermore when designing an interaction with a SIA, we must not only consider how SIAs should speak but how much, and whether they should even speak at all. These considerations cannot be ignored. Any speech synthesis that is used in the context of an artificial agent will have a perceived accent, a vocal style, an underlying emotion and an intonational model. Dimensions like accent and personality (cross speaker parameters) as well as vocal style, emotion and intonation during an interaction (within-speaker parameters) need to be built in the design of a synthetic voice. Even a default or neutral voice has to consider these same expressive speech synthesis components. Such design parameters have a strong influence on how effectively a system will interact, how it is perceived and its assumed ability to perform a task or function. To ignore these is to blindly accept a set of design decisions that ignores the complex effect speech has on the user’s successful interaction with a system. Thus expressive speech synthesis is a key design component in SIAs. This chapter explores the world of expressive speech synthesis, aiming to act as a starting point for those interested in the design, building and evaluation of such artificial speech. The debates and literature within this topic are vast and are fundamentally multidisciplinary in focus, covering a wide range of disciplines such as linguistics, pragmatics, psychology, speech and language technology, robotics and human-computer interaction (HCI), to name a few. It is not our aim to synthesise these areas but to give a scaffold and a starting point for the reader by exploring the critical dimensions and decisions they may need to consider when choosing to use expressive speech. To do this, the chapter explores the building of expressive synthesis, highlighting key decisions and parameters as well as emphasising future challenges in expressive speech research and development. Yet, before these are expanded upon we must first try and define what we actually mean by expressive speech

Cronfa at Swansea University

Presence 2005: the eighth annual international workshop on presence, 21-23 September, 2005 University College London (Conference proceedings)

Author
Publication venue: Department of Computer Science, UCL (University College London)
Publication date: 21/09/2005
Field of study

OVERVIEW (taken from the CALL FOR PAPERS) Academics and practitioners with an interest in the concept of (tele)presence are invited to submit their work for presentation at PRESENCE 2005 at University College London in London, England, September 21-23, 2005. The eighth in a series of highly successful international workshops, PRESENCE 2005 will provide an open discussion forum to share ideas regarding concepts and theories, measurement techniques, technology, and applications related to presence, the psychological state or subjective perception in which a person fails to accurately and completely acknowledge the role of technology in an experience, including the sense of 'being there' experienced by users of advanced media such as virtual reality. The concept of presence in virtual environments has been around for at least 15 years, and the earlier idea of telepresence at least since Minsky's seminal paper in 1980. Recently there has been a burst of funded research activity in this area for the first time with the European FET Presence Research initiative. What do we really know about presence and its determinants? How can presence be successfully delivered with today's technology? This conference invites papers that are based on empirical results from studies of presence and related issues and/or which contribute to the technology for the delivery of presence. Papers that make substantial advances in theoretical understanding of presence are also welcome. The interest is not solely in virtual environments but in mixed reality environments. Submissions will be reviewed more rigorously than in previous conferences. High quality papers are therefore sought which make substantial contributions to the field. Approximately 20 papers will be selected for two successive special issues for the journal Presence: Teleoperators and Virtual Environments. PRESENCE 2005 takes place in London and is hosted by University College London. The conference is organized by ISPR, the International Society for Presence Research and is supported by the European Commission's FET Presence Research Initiative through the Presencia and IST OMNIPRES projects and by University College London

UCL Discovery

A forensic phonetic study of the vocal responses of individuals in distress

Author: Roberts Lisa S
Publication venue: University of York
Publication date: 01/09/2012
Field of study

The production and perception of emotional speech is of growing importance to forensic speech scientists. They are often asked by instructing parties to provide an opinion as to whether recordings representing a violent attack are genuine, and whether speech material reflects real distress. However, they are prohibited from making statements regarding the psychological states of speakers by the International Association of Forensic Phonetics and Acoustics Code of Practice (IAFPA 2004). This study investigates two principal questions. First, it investigates how distress speech can be manifested acoustically. In so doing it proposes a taxonomy for comparing distress speech across speakers, assists in delimiting the boundaries of the vocal repertoire, and considers the extent to which acoustic measures of distress speech can distinguish between the vocalisations of real victims and actors. Second, it investigates whether listeners can discriminate between genuine and acted distress portrayals, and to what extent familiarity with forensic material increases listeners’ ability. Recordings from authentic criminal cases involving violent attack are compared with re-enactments by trained actors. Acoustic analyses examine F0, intensity, vowel formant frequencies and articulation rate. The recordings are also used as stimuli in a perceptual listening test, comparing the performance of lay listeners, police call takers and forensic practitioners. The findings lend support to the view that assessments of distress should be exercised with extreme caution. On the one hand, acoustic parameters can distinguish between non-distress and distress conditions, but cannot discriminate between acted and authentic distress, and so IAFPA’s refrain from such an assessment is justified. On the other, listeners who are familiar with authentic distress data, such as police call takers and forensic practitioners, are better able to differentiate between acted and authentic distress than lay listeners. Thus, if an assessment were to be made, the forensic practitioners may be the best group to do so

White Rose E-theses Online