Search CORE

140 research outputs found

The Effect of Narrow-Band Transmission on Recognition of Paralinguistic Information From Human Vocalizations

Author: Fruhholz S
Marchi E
Schuller B
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Practically, no knowledge exists on the effects of speech coding and recognition for narrow-band transmission of speech signals within certain frequency ranges especially in relation to the recognition of paralinguistic cues in speech. We thus investigated the impact of narrow-band standard speech coders on the machine-based classification of affective vocalizations and clinical vocal recordings. In addition, we analyzed the effect of speech low-pass filtering by a set of different cut-off frequencies, either chosen as static values in the 0.5-5-kHz range or given dynamically by different upper limits from the first five speech formants (F1-F5). Speech coding and recognition were tested, first, according to short-term speaker states by using affective vocalizations as given by the Geneva Multimodal Emotion Portrayals. Second, in relation to long-term speaker traits, we tested vocal recording from clinical populations involving speech impairments as found in the Child Pathological Speech Database. We employ a large acoustic feature space derived from the Interspeech Computational Paralinguistics Challenge. Besides analysis of the sheer corruption outcome, we analyzed the potential of matched and multicondition training as opposed to miss-matched condition. In the results, first, multicondition and matched-condition training significantly increase performances as opposed to mismatched condition. Second, downgrades in classification accuracy occur, however, only at comparably severe levels of low-pass filtering. The downgrades especially appear for multi-categorical rather than for binary decisions. These can be dealt with reasonably by the alluded strategies

Spiral - Imperial College Digital Repository

ZORA

Calibrated Prediction Intervals for Neural Network Regressors

Author: Cummins Nicholas
Keren Gil
Schuller Björn
Publication venue
Publication date: 01/01/2018
Field of study

Ongoing developments in neural network models are continually advancing the state of the art in terms of system accuracy. However, the predicted labels should not be regarded as the only core output; also important is a well-calibrated estimate of the prediction uncertainty. Such estimates and their calibration are critical in many practical applications. Despite their obvious aforementioned advantage in relation to accuracy, contemporary neural networks can, generally, be regarded as poorly calibrated and as such do not produce reliable output probability estimates. Further, while post-processing calibration solutions can be found in the relevant literature, these tend to be for systems performing classification. In this regard, we herein present two novel methods for acquiring calibrated predictions intervals for neural network regressors: empirical calibration and temperature scaling. In experiments using different regression tasks from the audio and computer vision domains, we find that both our proposed methods are indeed capable of producing calibrated prediction intervals for neural network regressors with any desired confidence level, a finding that is consistent across all datasets and neural network architectures we experimented with. In addition, we derive an additional practical recommendation for producing more accurate calibrated prediction intervals. We release the source code implementing our proposed methods for computing calibrated predicted intervals. The code for computing calibrated predicted intervals is publicly available

arXiv.org e-Print Archive

OPUS Augsburg

Crossref

Predicting and auralizing acoustics in classrooms

Author: Christensen Claus Lynge
Publication venue
Publication date: 01/01/2005
Field of study

Although classrooms have fairly simple geometries, this type of room is known to cause problems when trying to predict their acoustics using room acoustics computer modeling. Some typical features from a room acoustics point of view are: Parallel walls, low ceilings (the rooms are flat), uneven distribution of absorption, and most of the floor being covered with furniture which at long distances act as scattering elements, and at short distance provide strong specular components. The importance of diffraction and scattering is illustrated in numbers and by means of auralization, using ODEON 8 Beta

Online Research Database In Technology

Auralization of an orchestra using multichannel and multisource technique (A)

Author: Rindel Jens Holger
Vigeant Michelle C.
Wang Lily M.
Publication venue
Publication date: 01/01/2006
Field of study

Online Research Database In Technology

A transparency model and its applications for simulation of reflector arrays and sound transmission (A)

Author: Christensen Claus Lynge
Rindel Jens Holger
Publication venue
Publication date: 01/01/2006
Field of study

Online Research Database In Technology

Models and analysis of vocal emissions for biomedical applications

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

Directory of Open Access Books (DOAB)

Recommended from our members

Telephone Transmission and Earwitnesses: Performance on Voice Parades Controlled for Voice Similarity.

Author: Hudson Toby
McDougall Kirsty
Nolan Francis
Publication venue: Phonetica
Publication date: 04/12/2015
Field of study

The effect of telephone transmission on a listener's ability to recognise a speaker in a voice parade is investigated. A hundred listeners (25 per condition) heard 1 of 5 'target' voices, then returned a week later for a voice parade. The 4 conditions were: target exposure and parade both at studio quality; exposure and parade both at telephone quality; studio exposure with telephone parade, and vice versa. Fewer correct identifications followed from telephone exposure and parade (64%) than from studio exposure and parade (76%). Fewer still resulted for studio exposure/telephone parade (60%) and, dramatically, only 32% for telephone exposure/studio parade. Certain speakers were identified more readily than others across all conditions. Confidence ratings reflected this effect of speaker, but not the effect of exposure/parade condition.ESRC; British AcademyThis is the author accepted manuscript. The final version is available from Karger via http://dx.doi.org/10.1159/00043938

Apollo (Cambridge)

University of Hertfordshire Research Archive

Origins of Human Language

Author
Publication venue: 'Peter Lang, International Academic Publishers'
Publication date
Field of study

This book proposes a detailed picture of the continuities and ruptures between communication in primates and language in humans. It explores a diversity of perspectives on the origins of language, including a fine description of vocal communication in animals, mainly in monkeys and apes, but also in birds, the study of vocal tract anatomy and cortical control of the vocal productions in monkeys and apes, the description of combinatory structures and their social and communicative value, and the exploration of the cognitive environment in which language may have emerged from nonhuman primate vocal or gestural communication

OAPEN Library

A Study of Accomodation of Prosodic and Temporal Features in Spoken Dialogues in View of Speech Technology Applications

Author: Kousidis Spyridon, [Thesis]
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2010
Field of study

Inter-speaker accommodation is a well-known property of human speech and human interaction in general. Broadly it refers to the behavioural patterns of two (or more) interactants and the effect of the (verbal and non-verbal) behaviour of each to that of the other(s). Implementation of thisbehavior in spoken dialogue systems is desirable as an improvement on the naturalness of humanmachine interaction. However, traditional qualitative descriptions of accommodation phenomena do not provide sufficient information for such an implementation. Therefore, a quantitativedescription of inter-speaker accommodation is required. This thesis proposes a methodology of monitoring accommodation during a human or humancomputer dialogue, which utilizes a moving average filter over sequential frames for each speaker. These frames are time-aligned across the speakers, hence the name Time Aligned Moving Average (TAMA). Analysis of spontaneous human dialogue recordings by means of the TAMA methodology reveals ubiquitous accommodation of prosodic features (pitch, intensity and speech rate) across interlocutors, and allows for statistical (time series) modeling of the behaviour, in a way which is meaningful for implementation in spoken dialogue system (SDS) environments.In addition, a novel dialogue representation is proposed that provides an additional point of view to that of TAMA in monitoring accommodation of temporal features (inter-speaker pause length and overlap frequency). This representation is a percentage turn distribution of individual speakercontributions in a dialogue frame which circumvents strict attribution of speaker-turns, by considering both interlocutors as synchronously active. Both TAMA and turn distribution metrics indicate that correlation of average pause length and overlap frequency between speakers can be attributed to accommodation (a debated issue), and point to possible improvements in SDS “turntaking” behaviour. Although the findings of the prosodic and temporal analyses can directly inform SDS implementations, further work is required in order to describe inter-speaker accommodation sufficiently, as well as to develop an adequate testing platform for evaluating the magnitude ofperceived improvement in human-machine interaction. Therefore, this thesis constitutes a first step towards a convincingly useful implementation of accommodation in spoken dialogue systems

Arrow@TUDublin

Predicting and auralizing acoustics in classrooms

Author: Claus Lynge Christensen
Publication venue: 'Acoustical Society of America (ASA)'
Publication date
Field of study

Crossref