Search CORE

7,804 research outputs found

Speaker and Speech Recognition Using Hierarchy Support Vector Machine and Backpropagation

Author: C. Djamal Esmeralda
F. Fadlilah Asti
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 18/09/2019
Field of study

Voice signal processing has been proposed to improve effectiveness and facilitate the public, such as Smart Home. This study aims a smart home simulation model to move doors, TVs, and lights from voice instructions. Sound signals are processed using Mel-frequency Cepstrum Coefficients (MFCC) to perform feature extraction. Then, the voice is recognized by the speaker using a hierarchy Support Vector Machine (SVM). So that unregistered speakers are not processed or are declared not having access rights. For the process of recognizing spoken words such as "Open the Door”,"Close the Door","Turn on the TV","Turn off the TV","Turn on the Lights" and "Turn Offthe Lights" are done using Backpropagation. The results showed that hierarchy SVM provided an accuracy of 71% compared to the single SVM of 45%

Proceeding of the Electrical Engineering Computer Science and Informatics

Exploitation of Phase-Based Features for Whispered Speech Emotion Recognition

Author: Deng J
Fruhholz S
Schuller B
Xu X
Zhang Z
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 20/06/2016
Field of study

Features for speech emotion recognition are usually dominated by the spectral magnitude information while they ignore the use of the phase spectrum because of the difficulty of properly interpreting it. Motivated by recent successes of phase-based features for speech processing, this paper investigates the effectiveness of phase information for whispered speech emotion recognition. We select two types of phase-based features (i.e., modified group delay features and all-pole group delay features), both which have shown wide applicability to all sorts of different speech analysis and are now studied in whispered speech emotion recognition. When exploiting these features, we propose a new speech emotion recognition framework, employing outer product in combination with power and L2 normalization. The according technique encodes any variable length sequence of the phase-based features into a fixed dimension vector regardless of the length of the input sequence. The resulting representation is fed to train a classification model with a linear kernel classifier. Experimental results on the Geneva Whispered Emotion Corpus database, including normal and whispered phonation, demonstrate the effectiveness of the proposed method when compared with other modern systems. It is also shown that, combining phase information with magnitude information could significantly improve performance over the common systems solely adopting magnitude information

Spiral - Imperial College Digital Repository

Quality Control in Remote Speech Data Collection

Author: Christensen Mads Graesboll
Jensen Jesper Rindom
Little Max A.
Poorjam Amir Hossein
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2019
Field of study

There is a need for algorithms that can automatically control the quality of the remotely collected speech databases by detecting potential outliers, which deserve further investigation. In this paper, a simple and effective approach for identification of outliers in a speech database is proposed. Using the deterministic minimum covariance determinant (DetMCD) algorithm to estimate the mean and covariance of the speech data in the mel-frequency cepstral domain, this approach identifies potential outliers based on the statistical distance of the observations in the feature space from the central location of the data that are larger than a predefined threshold. DetMCD is a computationally efficient algorithm, which provides a highly robust estimate of the mean and covariance of multivariate data even when 50% of the data are outliers. Experimental results using eight different speech databases with manually inserted outliers show the effectiveness of the proposed method for outlier detection in speech databases. Moreover, applying the proposed method to a remotely collected Parkinson's voice database shows that the outliers that are part of the database are detected with 97.4% accuracy, resulting in a significant decrease in the effort required for manually controlling the quality of the database

Aston Publications Explorer

VBN

Whispered speech segmentation based on Deep Learning

Author: Gonçalo Duarte Nunes
Publication venue
Publication date: 17/07/2023
Field of study

Repositório Aberto da Universidade do Porto

Detección automática de la enfermedad de Parkinson usando componentes moduladoras de señales de voz

Author: Argüello- Vélez Patricia
Moofarrry Jhon Freddy
Sarria-Paja Milton
Publication venue: 'Corporation Universidad de la Costa, CUC'
Publication date: 01/01/2020
Field of study

Parkinson’s Disease (PD) is the second most common neurodegenerative disorder after Alzheimer’s disease. This disorder mainly affects older adults at a rate of about 2%, and about 89% of people diagnosed with PD also develop speech disorders. This has led scientific community to research information embedded in speech signal from Parkinson’s patients, which has allowed not only a diagnosis of the pathology but also a follow-up of its evolution. In recent years, a large number of studies have focused on the automatic detection of pathologies related to the voice, in order to make objective evaluations of the voice in a non-invasive manner. In cases where the pathology primarily affects the vibratory patterns of vocal folds such as Parkinson’s, the analyses typically performed are sustained over vowel pronunciations. In this article, it is proposed to use information from slow and rapid variations in speech signals, also known as modulating components, combined with an effective dimensionality reduction approach that will be used as input to the classification system. The proposed approach achieves classification rates higher than 88 %, surpassing the classical approach based on Mel Cepstrals Coefficients (MFCC). The results show that the information extracted from slow varying components is highly discriminative for the task at hand, and could support assisted diagnosis systems for PD.La Enfermedad de Parkinson (EP) es el segundo trastorno neurodegenerativo más común después de la enfermedad de Alzheimer. Este trastorno afecta principalmente a los adultos mayores con una tasa de aproximadamente el 2%, y aproximadamente el 89% de las personas diagnosticadas con EP también desarrollan trastornos del habla. Esto ha llevado a la comunidad científica a investigar información embebida en las señales de voz de pacientes diagnosticados con la EP, lo que ha permitido no solo un diagnóstico de la patología sino también un seguimiento de su evolución. En los últimos años, una gran cantidad de estudios se han centrado en la detección automática de patologías relacionadas con la voz, a fin de realizar evaluaciones objetivas de manera no invasiva. En los casos en que la patología afecta principalmente los patrones vibratorios de las cuerdas vocales como el Parkinson, los análisis que se realizan típicamente sobre grabaciones de vocales sostenidas. En este artículo, se propone utilizar información de componentes con variación lenta de las señales de voz, también conocidas como componentes de modulación, combinadas con un enfoque efectivo de reducción de dimensiónalidad que se utilizará como entrada al sistema de clasificación. El enfoque propuesto logra tasas de clasificación superiores al 88 %, superando el enfoque clásico basado en los Coeficientes Cepstrales de Mel (MFCC). Los resultados muestran que la información extraída de componentes que varían lentamente es altamente discriminatoria para el problema abordado y podría apoyar los sistemas de diagnóstico asistido para EP

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Digital CUC

Does knowing speaker sex facilitate vowel recognition at short durations?

Author: Smith David R. R.
Publication venue: 'Elsevier BV'
Publication date: 01/02/2014
Field of study

A man, woman or child saying the same vowel do so with very different voices. The auditory system solves the complex problem of extracting what the man, woman or child has said despite substantial differences in the acoustic properties of their voices. Much of the acoustic variation between the voices of men and woman is due to changes in the underlying anatomical mechanisms for producing speech. If the auditory system knew the sex of the speaker then it could potentially correct for speaker sex related acoustic variation thus facilitating vowel recognition. This study measured the minimum stimulus duration necessary to accurately discriminate whether a brief vowel segment was spoken by a man or woman, and the minimum stimulus duration necessary to accurately recognise what vowel was spoken. Results showed that reliable vowel recognition precedes reliable speaker sex discrimination, thus questioning the use of speaker sex information in compensating for speaker sex related acoustic variation in the voice. Furthermore, the pattern of performance across experiments where the fundamental frequency and formant frequency information of speaker’s voices were systematically varied, was markedly different depending on whether the task was speaker-sex discrimination or vowel recognition. This argues for there being little relationship between perception of speaker sex (indexical information) and perception of what has been said (linguistic information) at short durations

Repository@Hull - Worktribe

A CNN-Based Approach to Identification of Degradations in Speech Signals

Author: Christensen Mads Græsbøll
Poorjam Amir Hossein
Saishu Yuki
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/02/2021
Field of study

VBN