140 research outputs found

    The Effect of Narrow-Band Transmission on Recognition of Paralinguistic Information From Human Vocalizations

    No full text
    Practically, no knowledge exists on the effects of speech coding and recognition for narrow-band transmission of speech signals within certain frequency ranges especially in relation to the recognition of paralinguistic cues in speech. We thus investigated the impact of narrow-band standard speech coders on the machine-based classification of affective vocalizations and clinical vocal recordings. In addition, we analyzed the effect of speech low-pass filtering by a set of different cut-off frequencies, either chosen as static values in the 0.5-5-kHz range or given dynamically by different upper limits from the first five speech formants (F1-F5). Speech coding and recognition were tested, first, according to short-term speaker states by using affective vocalizations as given by the Geneva Multimodal Emotion Portrayals. Second, in relation to long-term speaker traits, we tested vocal recording from clinical populations involving speech impairments as found in the Child Pathological Speech Database. We employ a large acoustic feature space derived from the Interspeech Computational Paralinguistics Challenge. Besides analysis of the sheer corruption outcome, we analyzed the potential of matched and multicondition training as opposed to miss-matched condition. In the results, first, multicondition and matched-condition training significantly increase performances as opposed to mismatched condition. Second, downgrades in classification accuracy occur, however, only at comparably severe levels of low-pass filtering. The downgrades especially appear for multi-categorical rather than for binary decisions. These can be dealt with reasonably by the alluded strategies

    Calibrated Prediction Intervals for Neural Network Regressors

    Get PDF
    Ongoing developments in neural network models are continually advancing the state of the art in terms of system accuracy. However, the predicted labels should not be regarded as the only core output; also important is a well-calibrated estimate of the prediction uncertainty. Such estimates and their calibration are critical in many practical applications. Despite their obvious aforementioned advantage in relation to accuracy, contemporary neural networks can, generally, be regarded as poorly calibrated and as such do not produce reliable output probability estimates. Further, while post-processing calibration solutions can be found in the relevant literature, these tend to be for systems performing classification. In this regard, we herein present two novel methods for acquiring calibrated predictions intervals for neural network regressors: empirical calibration and temperature scaling. In experiments using different regression tasks from the audio and computer vision domains, we find that both our proposed methods are indeed capable of producing calibrated prediction intervals for neural network regressors with any desired confidence level, a finding that is consistent across all datasets and neural network architectures we experimented with. In addition, we derive an additional practical recommendation for producing more accurate calibrated prediction intervals. We release the source code implementing our proposed methods for computing calibrated predicted intervals. The code for computing calibrated predicted intervals is publicly available

    Predicting and auralizing acoustics in classrooms

    Get PDF
    Although classrooms have fairly simple geometries, this type of room is known to cause problems when trying to predict their acoustics using room acoustics computer modeling. Some typical features from a room acoustics point of view are: Parallel walls, low ceilings (the rooms are flat), uneven distribution of absorption, and most of the floor being covered with furniture which at long distances act as scattering elements, and at short distance provide strong specular components. The importance of diffraction and scattering is illustrated in numbers and by means of auralization, using ODEON 8 Beta

    Models and analysis of vocal emissions for biomedical applications

    Get PDF
    This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

    Origins of Human Language

    Get PDF
    This book proposes a detailed picture of the continuities and ruptures between communication in primates and language in humans. It explores a diversity of perspectives on the origins of language, including a fine description of vocal communication in animals, mainly in monkeys and apes, but also in birds, the study of vocal tract anatomy and cortical control of the vocal productions in monkeys and apes, the description of combinatory structures and their social and communicative value, and the exploration of the cognitive environment in which language may have emerged from nonhuman primate vocal or gestural communication

    A Study of Accomodation of Prosodic and Temporal Features in Spoken Dialogues in View of Speech Technology Applications

    Get PDF
    Inter-speaker accommodation is a well-known property of human speech and human interaction in general. Broadly it refers to the behavioural patterns of two (or more) interactants and the effect of the (verbal and non-verbal) behaviour of each to that of the other(s). Implementation of thisbehavior in spoken dialogue systems is desirable as an improvement on the naturalness of humanmachine interaction. However, traditional qualitative descriptions of accommodation phenomena do not provide sufficient information for such an implementation. Therefore, a quantitativedescription of inter-speaker accommodation is required. This thesis proposes a methodology of monitoring accommodation during a human or humancomputer dialogue, which utilizes a moving average filter over sequential frames for each speaker. These frames are time-aligned across the speakers, hence the name Time Aligned Moving Average (TAMA). Analysis of spontaneous human dialogue recordings by means of the TAMA methodology reveals ubiquitous accommodation of prosodic features (pitch, intensity and speech rate) across interlocutors, and allows for statistical (time series) modeling of the behaviour, in a way which is meaningful for implementation in spoken dialogue system (SDS) environments.In addition, a novel dialogue representation is proposed that provides an additional point of view to that of TAMA in monitoring accommodation of temporal features (inter-speaker pause length and overlap frequency). This representation is a percentage turn distribution of individual speakercontributions in a dialogue frame which circumvents strict attribution of speaker-turns, by considering both interlocutors as synchronously active. Both TAMA and turn distribution metrics indicate that correlation of average pause length and overlap frequency between speakers can be attributed to accommodation (a debated issue), and point to possible improvements in SDS “turntaking” behaviour. Although the findings of the prosodic and temporal analyses can directly inform SDS implementations, further work is required in order to describe inter-speaker accommodation sufficiently, as well as to develop an adequate testing platform for evaluating the magnitude ofperceived improvement in human-machine interaction. Therefore, this thesis constitutes a first step towards a convincingly useful implementation of accommodation in spoken dialogue systems

    Predicting and auralizing acoustics in classrooms

    Full text link
    corecore