Search CORE

1,809 research outputs found

Spoken affect classification : algorithms and experimental implementation : a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in Computer Science at Massey University, Palmerston North, New Zealand

Author: Morrison Donn Alexander
Publication venue: 'Massey University'
Publication date: 01/01/2005
Field of study

Machine-based emotional intelligence is a requirement for natural interaction between humans and computer interfaces and a basic level of accurate emotion perception is needed for computer systems to respond adequately to human emotion. Humans convey emotional information both intentionally and unintentionally via speech patterns. These vocal patterns are perceived and understood by listeners during conversation. This research aims to improve the automatic perception of vocal emotion in two ways. First, we compare two emotional speech data sources: natural, spontaneous emotional speech and acted or portrayed emotional speech. This comparison demonstrates the advantages and disadvantages of both acquisition methods and how these methods affect the end application of vocal emotion recognition. Second, we look at two classification methods which have gone unexplored in this field: stacked generalisation and unweighted vote. We show how these techniques can yield an improvement over traditional classification methods

Massey Research Online

Exploring the impact of data poisoning attacks on machine learning model reliability

Author: Marrone S.
Marulli F.
Verde L.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

Recent years have seen the widespread adoption of Artificial Intelligence techniques in several domains, including healthcare, justice, assisted driving and Natural Language Processing (NLP) based applications (e.g., the Fake News detection). Those mentioned are just a few examples of some domains that are particularly critical and sensitive to the reliability of the adopted machine learning systems. Therefore, several Artificial Intelligence approaches were adopted as support to realize easy and reliable solutions aimed at improving the early diagnosis, personalized treatment, remote patient monitoring and better decision-making with a consequent reduction of healthcare costs. Recent studies have shown that these techniques are venerable to attacks by adversaries at phases of artificial intelligence. Poisoned data set are the most common attack to the reliability of Artificial Intelligence approaches. Noise, for example, can have a significant impact on the overall performance of a machine learning model. This study discusses the strength of impact of noise on classification algorithms. In detail, the reliability of several machine learning techniques to distinguish correctly pathological and healthy voices by analysing poisoning data was evaluated. Voice samples selected by available database, widely used in research sector, the Saarbruecken Voice Database, were processed and analysed to evaluate the resilience and classification accuracy of these techniques. All analyses are evaluated in terms of accuracy, specificity, sensitivity, F1-score and ROC area

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"

A two-stage approach using Gaussian mixture models and higher-order statistics for a classification of normal and pathological voices

Author
Publication venue: Springer
Publication date: 30/11/2012
Field of study

Springer - Publisher Connector

Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection

Author: Declan Ae Costello
Declan Ae Costello
Irene M
Irene M Moroz
Max A Little
Patrick E Mcsharry
Stephen J Roberts
Publication venue
Publication date: 01/01/2007
Field of study

Background: Voice disorders affect patients profoundly, and acoustic tools can potentially measure voice function objectively. Disordered sustained vowels exhibit wide-ranging phenomena, from nearly periodic to highly complex, aperiodic vibrations, and increased "breathiness". Modelling and surrogate data studies have shown significant nonlinear and non-Gaussian random properties in these sounds. Nonetheless, existing tools are limited to analysing voices displaying near periodicity, and do not account for this inherent biophysical nonlinearity and non-Gaussian randomness, often using linear signal processing methods insensitive to these properties. They do not directly measure the two main biophysical symptoms of disorder: complex nonlinear aperiodicity, and turbulent, aeroacoustic, non-Gaussian randomness. Often these tools cannot be applied to more severe disordered voices, limiting their clinical usefulness.

Methods: This paper introduces two new tools to speech analysis: recurrence and fractal scaling, which overcome the range limitations of existing tools by addressing directly these two symptoms of disorder, together reproducing a "hoarseness" diagram. A simple bootstrapped classifier then uses these two features to distinguish normal from disordered voices.

Results: On a large database of subjects with a wide variety of voice disorders, these new techniques can distinguish normal from disordered cases, using quadratic discriminant analysis, to overall correct classification performance of 91.8% plus or minus 2.0%. The true positive classification performance is 95.4% plus or minus 3.2%, and the true negative performance is 91.5% plus or minus 2.3% (95% confidence). This is shown to outperform all combinations of the most popular classical tools.

Conclusions: Given the very large number of arbitrary parameters and computational complexity of existing techniques, these new techniques are far simpler and yet achieve clinically useful classification performance using only a basic classification technique. They do so by exploiting the inherent nonlinearity and turbulent randomness in disordered voice signals. They are widely applicable to the whole range of disordered voice phenomena by design. These new measures could therefore be used for a variety of practical clinical purposes.&#xa

arXiv.org e-Print Archive

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Oxford University Research Archive

Nature Precedings

Bulbar ALS Detection Based on Analysis of Voice Perturbation and Vibrato

Author: azarov
baken
boersma
ingre
nakano
yunusova
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/03/2020
Field of study

On average the lack of biological markers causes a one year diagnostic delay to detect amyotrophic lateral sclerosis (ALS). To improve the diagnostic process an automatic voice assessment based on acoustic analysis can be used. The purpose of this work was to verify the sutability of the sustain vowel phonation test for automatic detection of patients with ALS. We proposed enhanced procedure for separation of voice signal into fundamental periods that requires for calculation of perturbation measurements (such as jitter and shimmer). Also we proposed method for quantitative assessment of pathological vibrato manifestations in sustain vowel phonation. The study's experiments show that using the proposed acoustic analysis methods, the classifier based on linear discriminant analysis attains 90.7\% accuracy with 86.7\% sensitivity and 92.2\% specificity.Comment: Proc. of International Conference Signal Processing Algorithms, Architectures, Arrangements, and Applications (SPA 2019

arXiv.org e-Print Archive

Crossref

Spectral analysis of pathological acoustic speech waveforms

Author: Medida Priyanka
Publication venue: Digital Scholarship@UNLV
Publication date: 01/01/2009
Field of study

Biomedical engineering is the application of engineering principles and techniques to the medical field. The design and problem solving skills of engineering are combined with medical and biological science, which improves medical disorder diagnosis and treatment. The purpose of this study is to develop an automated procedure for detecting excessive jitter in speech signals, which is useful for differentiating normal from pathologic speech. The fundamental motivation for this research is that tools are needed by speech pathologists and laryngologists for use in the early detection and treatment of laryngeal disorders. Acoustical analysis of speech was performed to analyze various features of a speech signal. Earlier research established a relation between pitch period jitter and harmonic bandwidth. This concept was used for detecting laryngeal disorders in speech since pathologic speech has been found to have larger amounts of jitter than normal speech. Our study was performed using vowel samples from the voice disorder database recorded at the Massachusetts Eye and Ear Infirmary (MEEI) in1994. The KAYPENTAX company markets this database. Software development was conducted using MATLAB, a user-friendly programming language which has been applied widely for signal processing. An algorithm was developed to compute harmonic bandwidths for various speech samples of sustained vowel sounds. Open and closed tests were conducted on 23 samples of pathologic and normal speech samples each. Classification results showed 69.56% probability of correct detection of pathologic speech samples during an open test

University of Nevada, Las Vegas Repository

GMM-based classifiers for the automatic detection of obstructive sleep apnea

Author: Blanco Murillo José Luis
Castellanos Domínguez César Germán
Godino Llorente Juan Ignacio
Gómez García J.A.
Hernández Gómez Luis Alfonso
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2013
Field of study

The aim of automatic pathological voice detection systems is to serve as tools, to medical specialists, for a more objective, less invasive and improved diagnosis of diseases. In this respect, the gold standard for those system include the usage of a optimized representation of the spectral envelope, either based on cepstral coefﬁcients from the mel-scaled Fourier spectral envelope (Mel-Frequency Cepstral Coefﬁcients) or from an all-pole estimation (Linear Prediction Coding Cepstral Coefﬁcients) forcharacterization, and Gaussian Mixture Models for posterior classiﬁcation. However, the study of recently proposed GMM-based classiﬁers as well as Nuisance mitigation techniques, such as those employed in speaker recognition, has not been widely considered inpathology detection labours. The present work aims at testing whether or not the employment of such speaker recognition tools might contribute to improve system performance in pathology detection systems, speciﬁcally in the automatic detection of Obstructive Sleep Apnea. The testing procedure employs an Obstructive Sleep Apnea database, in conjunction with GMM-based classiﬁers looking for a better performance. The results show that an improved performance might be obtained by using such approach

Archivo Digital UPM