568 research outputs found
Formant analysis in dysphonic patients and automatic Arabic digit speech recognition
<p>Abstract</p> <p>Background and objective</p> <p>There has been a growing interest in objective assessment of speech in dysphonic patients for the classification of the type and severity of voice pathologies using automatic speech recognition (ASR). The aim of this work was to study the accuracy of the conventional ASR system (with Mel frequency cepstral coefficients (MFCCs) based front end and hidden Markov model (HMM) based back end) in recognizing the speech characteristics of people with pathological voice.</p> <p>Materials and methods</p> <p>The speech samples of 62 dysphonic patients with six different types of voice disorders and 50 normal subjects were analyzed. The Arabic spoken digits were taken as an input. The distribution of the first four formants of the vowel /a/ was extracted to examine deviation of the formants from normal.</p> <p>Results</p> <p>There was 100% recognition accuracy obtained for Arabic digits spoken by normal speakers. However, there was a significant loss of accuracy in the classifications while spoken by voice disordered subjects. Moreover, no significant improvement in ASR performance was achieved after assessing a subset of the individuals with disordered voices who underwent treatment.</p> <p>Conclusion</p> <p>The results of this study revealed that the current ASR technique is not a reliable tool in recognizing the speech of dysphonic patients.</p
Análisis cepstral y la transformada de Hilbert-Huang para la detección automática de la enfermedad de Parkinson
Most patients with Parkinson’s Disease (PD) develop speech deficits, including reduced sonority, altered articulation, and abnormal prosody. This article presents a methodology to automatically classify patients with PD and Healthy Control (HC) subjects. In this study, the Hilbert-Huang Transform (HHT) and Mel-Frequency Cepstral Coefficients (MFCCs) were considered to model modulated phonations (changing the tone from low to high and vice versa) of the vowels /a/, /i/, and /u/. The HHT was used to extract the first two formants from audio signals with the aim of modeling the stability of the tongue while the speakers were producing modulated vowels. Kruskal-Wallis statistical tests were used to eliminate redundant and non-relevant features in order to improve classification accuracy. PD patients and HC subjects were automatically classified using a Radial Basis Support Vector Machine (RBF-SVM). The results show that the proposed approach allows an automatic discrimination between PD and HC subjects with accuracies of up to 75 % for women and 73 % for men.La mayoría de las personas con la enfermedad de Parkinson (EP) desarrollan varios déficits del habla, incluyendo sonoridad reducida, alteración de la articulación y prosodia anormal. Este artículo presenta una metodología que permite la clasificación automática de pacientes con EP y sujetos de control sanos (CS). Se considera que la transformada de Hilbert-Huang (THH) y los Coeficientes Cepstrales en las frecuencias de Mel modelan las fonaciones moduladas (cambiando el tono de bajo a alto y de alto a bajo) de las vocales /a/, /i/, y /u/. La THH se utiliza para extraer los dos primeros formantes de las señales de audio, con el objetivo de modelar la estabilidad de la lengua mientras los hablantes producen vocales moduladas. Pruebas estadísticas de Kruskal-Wallis se utilizan para eliminar características redundantes y no relevantes, con el fin de mejorar la precisión de la clasificación. La clasificación automática de sujetos con EP vs. CS se realiza mediante una máquina de soporte vectorial de base radial. De acuerdo con los resultados, el enfoque propuesto permite la discriminación automática de sujetos con EP vs. CS con precisiones de hasta el 75 % para los hombres y 73 % para las mujeres
Classification of voice disorder in children with cochlear implantation and hearing aid using multiple classifier fusion
<p>Abstract</p> <p>Background</p> <p>Speech production and speech phonetic features gradually improve in children by obtaining audio feedback after cochlear implantation or using hearing aids. The aim of this study was to develop and evaluate automated classification of voice disorder in children with cochlear implantation and hearing aids.</p> <p>Methods</p> <p>We considered 4 disorder categories in children's voice using the following definitions:</p> <p>Level_1: Children who produce spontaneous phonation and use words spontaneously and imitatively.</p> <p>Level_2: Children, who produce spontaneous phonation, use words spontaneously and make short sentences imitatively.</p> <p>Level_3: Children, who produce spontaneous phonations, use words and arbitrary sentences spontaneously.</p> <p>Level_4: Normal children without any hearing loss background. Thirty Persian children participated in the study, including six children in each level from one to three and 12 children in level four. Voice samples of five isolated Persian words "mashin", "mar", "moosh", "gav" and "mouz" were analyzed. Four levels of the voice quality were considered, the higher the level the less significant the speech disorder. "Frame-based" and "word-based" features were extracted from voice signals. The frame-based features include intensity, fundamental frequency, formants, nasality and approximate entropy and word-based features include phase space features and wavelet coefficients. For frame-based features, hidden Markov models were used as classifiers and for word-based features, neural network was used.</p> <p>Results</p> <p>After Classifiers fusion with three methods: Majority Voting Rule, Linear Combination and Stacked fusion, the best classification rates were obtained using frame-based and word-based features with MVR rule (level 1:100%, level 2: 93.75%, level 3: 100%, level 4: 94%).</p> <p>Conclusions</p> <p>Result of this study may help speech pathologists follow up voice disorder recovery in children with cochlear implantation or hearing aid who are in the same age range.</p
Models and Analysis of Vocal Emissions for Biomedical Applications
The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the newborn to the adult and elderly. Over the years the initial issues have grown and spread also in other fields of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years in Firenze, Italy. This edition celebrates twenty-two years of uninterrupted and successful research in the field of voice analysis
Identification of voice pathologies in an elderly population
Ageing is associated with an increased risk of developing diseases, including a greater pre-
disposition to develop diseases such as Sepsis. Also, with ageing, human voices undergo a
natural degradation gauged by alterations in hoarseness, breathiness, articulatory ability,
and speaking rate. Nowadays, perceptual evaluation is widely used to assess speech and
voice impairments despite its high subjectivity.
This dissertation proposes a new method for detecting and identifying voice patholo-
gies by exploring acoustic parameters of continuous speech signals in the elderly popula-
tion. Additionally, a study of the influence of gender and age on voice pathology detection
systems’ performance is conducted.
The study included 44 subjects older than 60 years old, with the pathologies Dyspho-
nia, Functional Dysphonia, and Spasmodic Dysphonia. In the dataset originated with
these settings, two gender-dependent subsets were created, one with only female samples
and the other with only male samples. The system developed used three feature selection
methods and five Machine Learning algorithms to classify the voice signal according to
the presence of pathology.
The binary classification, which consisted of voice pathology detection, reached an
accuracy of 85,1%±5,1% for the dataset without gender division, 83,7%±7,0% for the
male dataset, and 87,4%±4,2% for the female dataset. As for the multiclass classifica-
tion, which consisted of the classification of different pathologies, reached an accuracy of
69,0%±5,1% for the dataset without gender division, 63,7%± 5,4% for the male dataset,
and 80,6%±8,1% for the female dataset.
The obtained results revealed that features that describe fluency are important and
discriminating in these types of systems. Also, Random Forest has shown to be the most
effective Machine Learning algorithm for both binary and multiclass classification.
The proposed model proves to be promising in detecting pathological voices and
identifying the underlying pathology in an elderly population, with an increase in its
performance when a gender division is performed.O envelhecimento está associado a um maior risco de desenvolvimento de doenças, nome-
adamente a uma maior predisposição para a evolução de doenças como a Sepsis. Inclusiva-
mente, com o envelhecimento, a voz sofre uma degradação natural aferindo-se alterações
na rouquidão, respiração, capacidade articulatória e no ritmo do discurso. Atualmente, a
avaliação percetual é amplamente utilizada para avaliar as perturbações da fala e da voz,
possuindo elevada subjetividade.
Esta dissertação propõe um novo método de deteção e identificação de patologias da
voz através da exploração de parâmetros acústicos de sinais de fala contínua na população
idosa. Adicionalmente, é realizado um estudo da influência do género e da idade no
desempenho dos sistemas de detecção de patologias da voz.
A amostra deste estudo é composta por 44 indivíduos com idades superiores a 60
anos referentes às patologias Disfonia, Disfonia Funcional e Disfonia Espasmódica. No
conjunto de dados originados com esta configuração, foram criados dois subconjuntos de-
pendentes do género: um com apenas amostras femininas e o outro com apenas amostras
masculinas. O sistema desenvolvido utilizou três métodos de seleção de atributos e cinco
algoritmos de Aprendizagem Automática de modo a classificar o sinal de voz de acordo
com a presença de patologias da voz.
A deteção de patologia de voz alcançou uma exatidão de 85,1%±5,1% para os da-
dos sem divisão de género, 83,7%±7,0% para os dados masculinos, e 87,4%±4,2% para
os dados femininos. A classificação de diferentes patologias alcançou uma exatidão de
69,0%±5,1% para os dados sem divisão de género, 63,7%±5,4% para os dados masculinos,
e 80,6%±8,1% para os dados femininos.
Os resultados obtidos revelaram que os atributos que caracterizam a fluência são
importantes e discriminatórios nestes tipos de sistemas. Ademais, o classificador Random
Forest demonstrou ser o algoritmo mais eficaz na deteção e identificação de patologias da
voz.
O modelo proposto revelou-se promissor na deteção de vozes patológicas e identifi-
cação da patologia subjacente numa população idosa, aumentando o seu desempenho
quando ocorre uma divisão de género
Gender voice classification with huge accuracy rate
Gender voice recognition stands for an imperative research field in acoustics and speech processing as human voice shows very remarkable aspects. This study investigates speech signals to devise a gender classifier by speech analysis to forecast the gender of the speaker by investigating diverse parameters of the voice sample. A database has 2270 voice samples of celebrities, both male and female. Through Mel frequency cepstrum coefficient (MFCC), vector quantization (VQ), and machine learning algorithm (J 48), an accuracy of about 100% is achieved by the proposed classification technique based on data mining and Java script
Models and Analysis of Vocal Emissions for Biomedical Applications
The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies
Classification and Detection of Specific Language Impairments in Children Based on their Speech Skills
The ability to use the spoken language is one of the most important characteristics in child development. Speech is difficult to replace in real life, although there are several other options for communication. Inabilities to communicate with speech skills can isolate children from society, especially children with specific language impairments. This research study focused on a specific disorder, known as specific language impairment (SLI); in the Czech language, it is specifically known as developmental dysphasia (DD). One major problem is that this disorder is detected at a relatively late age. Early diagnosis is critical for successful speech therapy in children. The current chapter presents several different approaches to solve this issue, including a simple test for detecting this disorder. One approach involves the use of an original iPad application for detecting SLI based on the number of pronunciation errors in utterances. One advantage of this method is its simplicity; anyone can use it, including parents
Envelhecimento vocal: estudo acústico-articulatório das alterações de fala com a idade
Background: Although the aging process causes specific alterations in the
speech organs, the knowledge about the age effects in speech production is still
disperse and incomplete. Objective: To provide a broader view of the age-related
segmental and suprasegmental speech changes in European Portuguese (EP),
considering new aspects besides static acoustic features, such as dynamic and
articulatory data. Method: Two databases, with speech data of Portuguese
adult native speakers obtained through standardized recording and segmentation
procedures, were devised: i) an acoustic database containing all EP oral
vowels produced in similar context (reading speech), and also a sample of semispontaneous
speech (image description) collected from a large sample of adults
between the ages 35 and 97; ii) and another with articulatory data (ultrasound
(US) tongue images synchronized with speech) for all EP oral vowels produced in
similar contexts (pseudowords and isolated) collected from young ([21-35]) and
older ([55-73]) adults. Results: Based on the curated databases, various aspects
of the aging speech were analyzed. Acoustically, the aging speech is characterized
by: 1) longer vowels (in both genders); 2) a tendency for F0 to decrease
in women and slightly increase in men; 3) lower vowel formant frequencies in
females; 4) a significant reduction of the vowel acoustic space in men; 5) vowels
with higher trajectory slope of F1 (in both genders); 6) shorter descriptions with
higher pause time for males; 7) faster speech and articulation rate for females;
and 8) lower HNR for females in semi-spontaneous speech. In addition, the total
speech duration decrease is associated to non-severe depression symptoms and
age. Older adults tended to present more depressive symptoms that could impact
the amount of speech produced. Concerning the articulatory data, the tongue
tends to be higher and more advanced with aging for almost all vowels, meaning
that the vowel articulatory space tends to be higher, advanced, and bigger in older
females. Conclusion: This study provides new information on aging speech for
a language other than English. These results corroborate that speech changes
with age and present different patterns between genders, and also suggest that
speakers might develop specific articulatory adjustments with aging.Contextualização: Embora o processo de envelhecimento cause alterações
específicas no sistema de produção de fala, o conhecimento sobre os efeitos da
idade na fala é ainda disperso e incompleto. Objetivo: Proporcionar uma visão
mais ampla das alterações segmentais e suprassegmentais da fala relacionadas
com a idade no Português Europeu (PE), considerando outros aspetos, para além
das características acústicas estáticas, tais como dados dinâmicos e articulatórios.
Método: Foram criadas duas bases de dados, com dados de fala de adultos
nativos do PE, obtidos através de procedimentos padronizados de gravação e
segmentação: i) uma base de dados acústica contendo todas as vogais orais do
PE em contexto semelhante (leitura de palavras), e também uma amostra de fala
semiespontânea (descrição de imagem) produzidas por uma larga amostra de
indivíduos entre os 35 e os 97 anos; ii) e outra com dados articulatórios (imagens
de ultrassom da língua sincronizadas com o sinal acústico) de todas as vogais
orais do PE produzidas em contextos semelhantes (pseudopalavras e palavras
isoladas) por adultos de duas faixas etárias ([21-35] e [55-73]). Resultados:
Tendo em conta as bases de dados curadas, foi analisado o efeito da idade em
diversas características da fala. Acusticamente, a fala de pessoas mais velhas é
caracterizada por: 1) vogais mais longas (ambos os sexos); 2) tendência para
F0 diminuir nas mulheres e aumentar ligeiramente nos homens; 3) diminuição
da frequência dos formantes das vogais nas mulheres; 4) redução significativa
do espaço acústico das vogais nos homens; 5) vogais com maior inclinação da
trajetória de F1 (ambos os sexos); 6) descrições mais curtas e com maior tempo
de pausa nos homens; 7) aumento da velocidade articulatória e da velocidade de
fala nas mulheres; e 8) diminuição do HNR na fala semiespontânea em mulheres.
Além disso, os idosos tendem a apresentar mais sintomas depressivos que podem
afetar a quantidade de fala produzida. Em relação aos dados articulatórios, a
língua tende a apresentar-se mais alta e avançada em quase todas as vogais com
a idade, ou seja o espaço articulatório das vogais tende a ser maior, mais alto
e avançado nas mulheres mais velhas. Conclusão: Este estudo fornece novos
dados sobre o efeito da idade na fala para uma língua diferente do inglês. Os
resultados corroboram que a fala sofre alterações com a idade, que diferem em
função do género, sugerindo ainda que os falantes podem desenvolver ajustes
articulatórios específicos com a idade.Programa Doutoral em Gerontologia e Geriatri
Specific Language Impairments and Possibilities of Classification and Detection from Children's Speech
Many young children have speech disorders. My research focused on one such disorder, known as specific language impairment or developmental dysphasia. A major problem in treating this disorder is the fact that specific language impairment is detected in children at a relatively late age. For successful speech therapy, early diagnosis is critical. I present two different approaches to this issue using a very simple test that I have devised for diagnosing this disorder. In this thesis, I describe a new method for detecting specific language impairment based on the number of pronunciation errors in utterances. An advantage of this method is its simplicity; anyone can use it, including parents. The second method is based on the acoustic features of the speech signal. An advantage of this method is that it could be used to develop an automatic detection system.
KeyKatedra teorie obvod
- …