568 research outputs found

    Formant analysis in dysphonic patients and automatic Arabic digit speech recognition

    Get PDF
    <p>Abstract</p> <p>Background and objective</p> <p>There has been a growing interest in objective assessment of speech in dysphonic patients for the classification of the type and severity of voice pathologies using automatic speech recognition (ASR). The aim of this work was to study the accuracy of the conventional ASR system (with Mel frequency cepstral coefficients (MFCCs) based front end and hidden Markov model (HMM) based back end) in recognizing the speech characteristics of people with pathological voice.</p> <p>Materials and methods</p> <p>The speech samples of 62 dysphonic patients with six different types of voice disorders and 50 normal subjects were analyzed. The Arabic spoken digits were taken as an input. The distribution of the first four formants of the vowel /a/ was extracted to examine deviation of the formants from normal.</p> <p>Results</p> <p>There was 100% recognition accuracy obtained for Arabic digits spoken by normal speakers. However, there was a significant loss of accuracy in the classifications while spoken by voice disordered subjects. Moreover, no significant improvement in ASR performance was achieved after assessing a subset of the individuals with disordered voices who underwent treatment.</p> <p>Conclusion</p> <p>The results of this study revealed that the current ASR technique is not a reliable tool in recognizing the speech of dysphonic patients.</p

    Análisis cepstral y la transformada de Hilbert-Huang para la detección automática de la enfermedad de Parkinson

    Get PDF
    Most patients with Parkinson’s Disease (PD) develop speech deficits, including reduced sonority, altered articulation, and abnormal prosody. This article presents a methodology to automatically classify patients with PD and Healthy Control (HC) subjects. In this study, the Hilbert-Huang Transform (HHT) and Mel-Frequency Cepstral Coefficients (MFCCs) were considered to model modulated phonations (changing the tone from low to high and vice versa) of the vowels /a/, /i/, and /u/. The HHT was used to extract the first two formants from audio signals with the aim of modeling the stability of the tongue while the speakers were producing modulated vowels. Kruskal-Wallis statistical tests were used to eliminate redundant and non-relevant features in order to improve classification accuracy. PD patients and HC subjects were automatically classified using a Radial Basis Support Vector Machine (RBF-SVM). The results show that the proposed approach allows an automatic discrimination between PD and HC subjects with accuracies of up to 75 % for women and 73&nbsp;% for men.La mayoría de las personas con la enfermedad de Parkinson (EP) desarrollan varios déficits del habla, incluyendo sonoridad reducida, alteración de la articulación y prosodia anormal. Este artículo presenta una metodología que permite la clasificación automática de pacientes con EP y sujetos de control sanos (CS). Se considera que la transformada de Hilbert-Huang (THH) y los Coeficientes Cepstrales en las frecuencias de Mel modelan las fonaciones moduladas (cambiando el tono de bajo a alto y de alto a bajo) de las vocales /a/, /i/, y /u/. La THH se utiliza para extraer los dos primeros formantes de las señales de audio, con el objetivo de modelar la estabilidad de la lengua mientras los hablantes producen vocales moduladas. Pruebas estadísticas de Kruskal-Wallis se utilizan para eliminar características redundantes y no relevantes, con el fin de mejorar la precisión de la clasificación. La clasificación automática de sujetos con EP vs. CS se realiza mediante una máquina de soporte vectorial de base radial. De acuerdo con los resultados, el enfoque propuesto permite la discriminación automática de sujetos con EP vs. CS con precisiones de hasta el 75 % para los hombres y 73 % para las mujeres

    Classification of voice disorder in children with cochlear implantation and hearing aid using multiple classifier fusion

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Speech production and speech phonetic features gradually improve in children by obtaining audio feedback after cochlear implantation or using hearing aids. The aim of this study was to develop and evaluate automated classification of voice disorder in children with cochlear implantation and hearing aids.</p> <p>Methods</p> <p>We considered 4 disorder categories in children's voice using the following definitions:</p> <p>Level_1: Children who produce spontaneous phonation and use words spontaneously and imitatively.</p> <p>Level_2: Children, who produce spontaneous phonation, use words spontaneously and make short sentences imitatively.</p> <p>Level_3: Children, who produce spontaneous phonations, use words and arbitrary sentences spontaneously.</p> <p>Level_4: Normal children without any hearing loss background. Thirty Persian children participated in the study, including six children in each level from one to three and 12 children in level four. Voice samples of five isolated Persian words "mashin", "mar", "moosh", "gav" and "mouz" were analyzed. Four levels of the voice quality were considered, the higher the level the less significant the speech disorder. "Frame-based" and "word-based" features were extracted from voice signals. The frame-based features include intensity, fundamental frequency, formants, nasality and approximate entropy and word-based features include phase space features and wavelet coefficients. For frame-based features, hidden Markov models were used as classifiers and for word-based features, neural network was used.</p> <p>Results</p> <p>After Classifiers fusion with three methods: Majority Voting Rule, Linear Combination and Stacked fusion, the best classification rates were obtained using frame-based and word-based features with MVR rule (level 1:100%, level 2: 93.75%, level 3: 100%, level 4: 94%).</p> <p>Conclusions</p> <p>Result of this study may help speech pathologists follow up voice disorder recovery in children with cochlear implantation or hearing aid who are in the same age range.</p

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the newborn to the adult and elderly. Over the years the initial issues have grown and spread also in other fields of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years in Firenze, Italy. This edition celebrates twenty-two years of uninterrupted and successful research in the field of voice analysis

    Identification of voice pathologies in an elderly population

    Get PDF
    Ageing is associated with an increased risk of developing diseases, including a greater pre- disposition to develop diseases such as Sepsis. Also, with ageing, human voices undergo a natural degradation gauged by alterations in hoarseness, breathiness, articulatory ability, and speaking rate. Nowadays, perceptual evaluation is widely used to assess speech and voice impairments despite its high subjectivity. This dissertation proposes a new method for detecting and identifying voice patholo- gies by exploring acoustic parameters of continuous speech signals in the elderly popula- tion. Additionally, a study of the influence of gender and age on voice pathology detection systems’ performance is conducted. The study included 44 subjects older than 60 years old, with the pathologies Dyspho- nia, Functional Dysphonia, and Spasmodic Dysphonia. In the dataset originated with these settings, two gender-dependent subsets were created, one with only female samples and the other with only male samples. The system developed used three feature selection methods and five Machine Learning algorithms to classify the voice signal according to the presence of pathology. The binary classification, which consisted of voice pathology detection, reached an accuracy of 85,1%±5,1% for the dataset without gender division, 83,7%±7,0% for the male dataset, and 87,4%±4,2% for the female dataset. As for the multiclass classifica- tion, which consisted of the classification of different pathologies, reached an accuracy of 69,0%±5,1% for the dataset without gender division, 63,7%± 5,4% for the male dataset, and 80,6%±8,1% for the female dataset. The obtained results revealed that features that describe fluency are important and discriminating in these types of systems. Also, Random Forest has shown to be the most effective Machine Learning algorithm for both binary and multiclass classification. The proposed model proves to be promising in detecting pathological voices and identifying the underlying pathology in an elderly population, with an increase in its performance when a gender division is performed.O envelhecimento está associado a um maior risco de desenvolvimento de doenças, nome- adamente a uma maior predisposição para a evolução de doenças como a Sepsis. Inclusiva- mente, com o envelhecimento, a voz sofre uma degradação natural aferindo-se alterações na rouquidão, respiração, capacidade articulatória e no ritmo do discurso. Atualmente, a avaliação percetual é amplamente utilizada para avaliar as perturbações da fala e da voz, possuindo elevada subjetividade. Esta dissertação propõe um novo método de deteção e identificação de patologias da voz através da exploração de parâmetros acústicos de sinais de fala contínua na população idosa. Adicionalmente, é realizado um estudo da influência do género e da idade no desempenho dos sistemas de detecção de patologias da voz. A amostra deste estudo é composta por 44 indivíduos com idades superiores a 60 anos referentes às patologias Disfonia, Disfonia Funcional e Disfonia Espasmódica. No conjunto de dados originados com esta configuração, foram criados dois subconjuntos de- pendentes do género: um com apenas amostras femininas e o outro com apenas amostras masculinas. O sistema desenvolvido utilizou três métodos de seleção de atributos e cinco algoritmos de Aprendizagem Automática de modo a classificar o sinal de voz de acordo com a presença de patologias da voz. A deteção de patologia de voz alcançou uma exatidão de 85,1%±5,1% para os da- dos sem divisão de género, 83,7%±7,0% para os dados masculinos, e 87,4%±4,2% para os dados femininos. A classificação de diferentes patologias alcançou uma exatidão de 69,0%±5,1% para os dados sem divisão de género, 63,7%±5,4% para os dados masculinos, e 80,6%±8,1% para os dados femininos. Os resultados obtidos revelaram que os atributos que caracterizam a fluência são importantes e discriminatórios nestes tipos de sistemas. Ademais, o classificador Random Forest demonstrou ser o algoritmo mais eficaz na deteção e identificação de patologias da voz. O modelo proposto revelou-se promissor na deteção de vozes patológicas e identifi- cação da patologia subjacente numa população idosa, aumentando o seu desempenho quando ocorre uma divisão de género

    Gender voice classification with huge accuracy rate

    Get PDF
    Gender voice recognition stands for an imperative research field in acoustics and speech processing as human voice shows very remarkable aspects. This study investigates speech signals to devise a gender classifier by speech analysis to forecast the gender of the speaker by investigating diverse parameters of the voice sample. A database has 2270 voice samples of celebrities, both male and female. Through Mel frequency cepstrum coefficient (MFCC), vector quantization (VQ), and machine learning algorithm (J 48), an accuracy of about 100% is achieved by the proposed classification technique based on data mining and Java script

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies

    Classification and Detection of Specific Language Impairments in Children Based on their Speech Skills

    Get PDF
    The ability to use the spoken language is one of the most important characteristics in child development. Speech is difficult to replace in real life, although there are several other options for communication. Inabilities to communicate with speech skills can isolate children from society, especially children with specific language impairments. This research study focused on a specific disorder, known as specific language impairment (SLI); in the Czech language, it is specifically known as developmental dysphasia (DD). One major problem is that this disorder is detected at a relatively late age. Early diagnosis is critical for successful speech therapy in children. The current chapter presents several different approaches to solve this issue, including a simple test for detecting this disorder. One approach involves the use of an original iPad application for detecting SLI based on the number of pronunciation errors in utterances. One advantage of this method is its simplicity; anyone can use it, including parents

    Envelhecimento vocal: estudo acústico-articulatório das alterações de fala com a idade

    Get PDF
    Background: Although the aging process causes specific alterations in the speech organs, the knowledge about the age effects in speech production is still disperse and incomplete. Objective: To provide a broader view of the age-related segmental and suprasegmental speech changes in European Portuguese (EP), considering new aspects besides static acoustic features, such as dynamic and articulatory data. Method: Two databases, with speech data of Portuguese adult native speakers obtained through standardized recording and segmentation procedures, were devised: i) an acoustic database containing all EP oral vowels produced in similar context (reading speech), and also a sample of semispontaneous speech (image description) collected from a large sample of adults between the ages 35 and 97; ii) and another with articulatory data (ultrasound (US) tongue images synchronized with speech) for all EP oral vowels produced in similar contexts (pseudowords and isolated) collected from young ([21-35]) and older ([55-73]) adults. Results: Based on the curated databases, various aspects of the aging speech were analyzed. Acoustically, the aging speech is characterized by: 1) longer vowels (in both genders); 2) a tendency for F0 to decrease in women and slightly increase in men; 3) lower vowel formant frequencies in females; 4) a significant reduction of the vowel acoustic space in men; 5) vowels with higher trajectory slope of F1 (in both genders); 6) shorter descriptions with higher pause time for males; 7) faster speech and articulation rate for females; and 8) lower HNR for females in semi-spontaneous speech. In addition, the total speech duration decrease is associated to non-severe depression symptoms and age. Older adults tended to present more depressive symptoms that could impact the amount of speech produced. Concerning the articulatory data, the tongue tends to be higher and more advanced with aging for almost all vowels, meaning that the vowel articulatory space tends to be higher, advanced, and bigger in older females. Conclusion: This study provides new information on aging speech for a language other than English. These results corroborate that speech changes with age and present different patterns between genders, and also suggest that speakers might develop specific articulatory adjustments with aging.Contextualização: Embora o processo de envelhecimento cause alterações específicas no sistema de produção de fala, o conhecimento sobre os efeitos da idade na fala é ainda disperso e incompleto. Objetivo: Proporcionar uma visão mais ampla das alterações segmentais e suprassegmentais da fala relacionadas com a idade no Português Europeu (PE), considerando outros aspetos, para além das características acústicas estáticas, tais como dados dinâmicos e articulatórios. Método: Foram criadas duas bases de dados, com dados de fala de adultos nativos do PE, obtidos através de procedimentos padronizados de gravação e segmentação: i) uma base de dados acústica contendo todas as vogais orais do PE em contexto semelhante (leitura de palavras), e também uma amostra de fala semiespontânea (descrição de imagem) produzidas por uma larga amostra de indivíduos entre os 35 e os 97 anos; ii) e outra com dados articulatórios (imagens de ultrassom da língua sincronizadas com o sinal acústico) de todas as vogais orais do PE produzidas em contextos semelhantes (pseudopalavras e palavras isoladas) por adultos de duas faixas etárias ([21-35] e [55-73]). Resultados: Tendo em conta as bases de dados curadas, foi analisado o efeito da idade em diversas características da fala. Acusticamente, a fala de pessoas mais velhas é caracterizada por: 1) vogais mais longas (ambos os sexos); 2) tendência para F0 diminuir nas mulheres e aumentar ligeiramente nos homens; 3) diminuição da frequência dos formantes das vogais nas mulheres; 4) redução significativa do espaço acústico das vogais nos homens; 5) vogais com maior inclinação da trajetória de F1 (ambos os sexos); 6) descrições mais curtas e com maior tempo de pausa nos homens; 7) aumento da velocidade articulatória e da velocidade de fala nas mulheres; e 8) diminuição do HNR na fala semiespontânea em mulheres. Além disso, os idosos tendem a apresentar mais sintomas depressivos que podem afetar a quantidade de fala produzida. Em relação aos dados articulatórios, a língua tende a apresentar-se mais alta e avançada em quase todas as vogais com a idade, ou seja o espaço articulatório das vogais tende a ser maior, mais alto e avançado nas mulheres mais velhas. Conclusão: Este estudo fornece novos dados sobre o efeito da idade na fala para uma língua diferente do inglês. Os resultados corroboram que a fala sofre alterações com a idade, que diferem em função do género, sugerindo ainda que os falantes podem desenvolver ajustes articulatórios específicos com a idade.Programa Doutoral em Gerontologia e Geriatri

    Specific Language Impairments and Possibilities of Classification and Detection from Children's Speech

    Get PDF
    Many young children have speech disorders. My research focused on one such disorder, known as specific language impairment or developmental dysphasia. A major problem in treating this disorder is the fact that specific language impairment is detected in children at a relatively late age. For successful speech therapy, early diagnosis is critical. I present two different approaches to this issue using a very simple test that I have devised for diagnosing this disorder. In this thesis, I describe a new method for detecting specific language impairment based on the number of pronunciation errors in utterances. An advantage of this method is its simplicity; anyone can use it, including parents. The second method is based on the acoustic features of the speech signal. An advantage of this method is that it could be used to develop an automatic detection system. KeyKatedra teorie obvod
    corecore