47 research outputs found
Perception of Alcoholic Intoxication in Speech
The ALC sub-challenge of the Interspeech Speaker State Chal-lenge (ISSC) aims at the automatic classification of speech sig-nals into intoxicated and sober speech. In this context we con-ducted a perception experiment on data derived from the same corpus to analyze the human performance on the same task. The results show that human still outperform comparable baseline results of ISSC. Female and male listeners perform on the same level, but there is strong evidence that intoxication in female voices is easier to be recognized than in male voices. Prosodic features contribute to the decision of human listeners but seem not to be dominant. In analogy to Doddington’s zoo of speaker verification we find some evidence for the existence of lambs and goats but no wolves. Index Terms: alcoholic intoxication, speech perception, forced choice, intonation, Alcohol Language Corpu
RANSAC-based training data selection for speaker state recognition
We present a Random Sampling Consensus (RANSAC) based training approach for the problem of speaker state recognition from spontaneous speech. Our system is trained and tested with the INTERSPEECH 2011 Speaker State Challenge
corpora that includes the Intoxication and the Sleepiness Subchallenges, where each sub-challenge defines a two-class classification
task. We aim to perform a RANSAC-based training
data selection coupled with the Support Vector Machine (SVM) based classification to prune possible outliers, which exist in the training data. Our experimental evaluations indicate that
utilization of RANSAC-based training data selection provides 66.32 % and 65.38 % unweighted average (UA) recall rate on the development and test sets for the Sleepiness Sub-challenge, respectively and a slight improvement on the Intoxicationubchallenge
performance.TÜBİTAK ; Türk Teleko
Prediction of sleepiness ratings from voice by man and machine
This paper looks in more detail at the Interspeech 2019
computational paralinguistics challenge on the prediction of
sleepiness ratings from speech. In this challenge, teams were
asked to train a regression model to predict sleepiness from
samples of the Düsseldorf Sleepy Language Corpus (DSLC).
This challenge was notable because the performance of all
entrants was uniformly poor, with even the winning system
only achieving a correlation of r=0.37. We look at whether the
task itself is achievable, and whether the corpus is suited to
training a machine learning system for the task. We perform a
listening experiment using samples from the corpus and show
that a group of human listeners can achieve a correlation of
r=0.7 on this task, although this is mainly by classifying the
recordings into one of three sleepiness groups. We show that
the corpus, because of its construction, confounds variation
with sleepiness and variation with speaker identity, and this
was the reason that machine learning systems failed to
perform well. We conclude that sleepiness rating prediction
from voice is not an impossible task, but that good
performance requires more information about sleepy speech
and its variability across listeners than is available in the
DSLC corpu
Annotation and detection of conflict escalation in political debates
Conflict escalation in multi-party conversations refers to an increase in the intensity of conflict during conversations. Here we study annotation and detection of conflict escalation in broadcast political debates towards a machine-mediated conflict management system. In this regard, we label conflict escalation using crowd-sourced annotations and predict it with automatically extracted conversational and prosodic features. In particular, to annotate the conflict escalation we deploy two different strategies, i.e., indirect inference and direct assessment; the direct assessment method refers to a way that annotators watch and compare two consecutive clips during the annotation process, while the indirect inference method indicates that each clip is independently annotated with respect to the level of conflict then the level conflict escalation is inferred by comparing annotations of two consecutive clips. Empirical results with 792 pairs of consecutive clips in classifying three types of conflict escalation, i.e., escalation, de-escalation, and constant, show that labels from direct assessment yield higher classification performance (45.3% unweighted accuracy (UA)) than the one from indirect inference (39.7% UA), although the annotations from both methods are highly correlated (r�=0.74 in continuous values and 63% agreement
in ternary classes)
The prediction of fatigue using speech as a biosignal
Automatic systems for estimating operator fatigue have application in safety-critical environments. We develop and evaluate a system to detect fatigue from speech recordings collected from speakers kept awake over a 60-hour period. A binary classification system (fatigued/not-fatigued) based on time spent awake showed good discrimination, with 80 % unweighted accuracy using raw features, and 90 % with speaker-normalized features. We describe the data collection, feature analysis, machine learning and cross-validation used in the study. Results are promising for real-world applications in domains such as aerospace, transportation and mining where operators are in regular verbal communication as part of their normal working activities
Furnariidae species recognition using speech-related features and machine learning
The automatic classification of calling bird species is important to achieve more exhaustive environmental monitoring and to manage natural resources. Bird vocalizations allow to recognise new species, their natural history and macro-systematic relations, while automatic systems can speed up and improve all the process. In this work, we use state-of-art features designed for speech and speaker state recognition to classify 25 species of Furnariidae family. Since Furnariidae species inhabit the Litoral Paranaense region of Argentina (South America), this work could promote further research on the topic and the implementation of in-situ monitoring systems. Our analysis includes two widely-known classification techniques: random forest an support vector machines. The results are promising, near 86%, and were validated in a cross-validation scheme.Sociedad Argentina de Informática e Investigación Operativa (SADIO