    Harmonic to noise ratio measurement - selection of window and length

    Harmonic to Noise Ratio (HNR) measures the ratio between periodic and non-periodic components of a speech sound. It has become more and more important in the vocal acoustic analysis to diagnose pathologic voices. The measure of this parameter can be done with Praat software that is commonly accept by the scientific community has an accurate measure. Anyhow, this measure is dependent with the type of window used and its length. In this paper an analysis of the influence of the window and its length was made. The Hanning, Hamming and Blackman windows and the lengths between 6 and 24 glottal periods were experimented. Speech files of control subjects and pathologic subjects were used. The results showed that the Hanning window with the length of 12 glottal periods gives measures of HNR more close to the Praat measures.info:eu-repo/semantics/publishedVersio

    Pragmatic functions of lengthenings and filled pauses in the adult-directed speech of Hungarian children

    Two most common disfluencies of spontaneous speech, vowel lengthenings (VLE) and non-lexicalized filled pauses (NLFP) were investigated in the adult-directed speech of eight Hungarian children. Though VLE and NLFP might seem to be similar vocalizations, recent investigations have shown that their occurrences might differ remarkably in child speech and may al-so change as a function of age. Based on these findings, in the present study the functional analysis of VLEs and NLFPs was performed. It was hypothesized that in child speech the two phenomena have roles not only in speech planning, but also in discourse management, and that they show functional distribution. The analysis provided evidence that VLE is more common than NLFP. VLE often tends to mark discourse events and may play a role in turn-final floor-holding strategies, while NLFP is mostly connected to speech planning, and occasionally, it may also participate in turn-taking gestures, as well

    Improved Algorithm for Pathological and Normal Voices Identification

    There are a lot of papers on automatic classification between normal and pathological voices, but they have the lack in the degree of severity estimation of the identified voice disorders. Building a model of pathological and normal voices identification, that can also evaluate the degree of severity of the identified voice disorders among students. In the present work, we present an automatic classifier using acoustical measurements on registered sustained vowels /a/ and pattern recognition tools based on neural networks. The training set was done by classifying students’ recorded voices based on threshold from the literature. We retrieve the pitch, jitter, shimmer and harmonic-to-noise ratio values of the speech utterance /a/, which constitute the input vector of the neural network. The degree of severity is estimated to evaluate how the parameters are far from the standard values based on the percent of normal and pathological values. In this work, the base data used for testing the proposed algorithm of the neural network is formed by healthy and pathological voices from German database of voice disorders. The performance of the proposed algorithm is evaluated in a term of the accuracy (97.9%), sensitivity (1.6%), and specificity (95.1%). The classification rate is 90% for normal class and 95% for pathological class

    Phonetic characteristics of filled pauses: the effects of speakers’ age

    Filled pauses usually reveal speech planning or execution problems even though the speaker does not produce an overt error and may have a function of discourse marker as well. In Hungarian, the most frequent form of filled pauses is a schwa-like vowel of various durations. The purpose of this study was to analyze the occurrence, duration and formant structure of Hungarian schwa-like filled pauses in 16 nine-year-old children, in 16 young adults and in 16 elderly speakers. Our hypotheses were that filled pauses (i) would be more frequent in elderly than in other two age groups, (ii) would show similar durations in all age groups, and (iii) would show different formant structures depending on age. Results confirmed age-dependent occurrences, durations and formants of filled pauses; the hypotheses were partly verified. Speakers’ age is one of those factors that influence the occurrences and formant values of filled pauses

    Phoneme-retrieval; voice recognition; vowels recognition

    A phoneme-retrieval technique is proposed, which is due to the particular way of the construction of the network. An initial set of neurons is given. The number of these neurons is approximately equal to the number of typical structures of the data. For example if the network is built for voice retrieval then the number of neurons must be equal to the number of characteristic phonemes of the alphabet of the language spoken by the social group to which the particular person belongs. Usually this task is very complicated and the network can depend critically on the samples used for the learning. If the network is built for image retrieval then it works only if the data to be retrieved belong to a particular set of images. If the network is built for voice recognition it works only for some particular set of words. A typical example is the words used for the flight of airplanes. For example a command like the "airplane should make a turn of 120 degrees towards the east" can be easily recognized by the network if a suitable learning procedure is used.Comment: 10 page

    Robust ASR using Support Vector Machines

    The improved theoretical properties of Support Vector Machines with respect to other machine learning alternatives due to their max-margin training paradigm have led us to suggest them as a good technique for robust speech recognition. However, important shortcomings have had to be circumvented, the most important being the normalisation of the time duration of different realisations of the acoustic speech units. In this paper, we have compared two approaches in noisy environments: first, a hybrid HMM–SVM solution where a fixed number of frames is selected by means of an HMM segmentation and second, a normalisation kernel called Dynamic Time Alignment Kernel (DTAK) first introduced in Shimodaira et al. [Shimodaira, H., Noma, K., Nakai, M., Sagayama, S., 2001. Support vector machine with dynamic time-alignment kernel for speech recognition. In: Proc. Eurospeech, Aalborg, Denmark, pp. 1841–1844] and based on DTW (Dynamic Time Warping). Special attention has been paid to the adaptation of both alternatives to noisy environments, comparing two types of parameterisations and performing suitable feature normalisation operations. The results show that the DTA Kernel provides important advantages over the baseline HMM system in medium to bad noise conditions, also outperforming the results of the hybrid system.Publicad

    Методическое и аппаратно-программное обеспечение для регистрации и обработки речевых сигналов с целью диагностики неврологических заболеваний

    The methodical and hardware-software on the basis of processing and analysis of speech signals for operative and objective diagnostics of neurological pathologies accompanied by speech function disorders are offered. The testing of developed methodological and software tools was carried out on the basis of the Republican Scientific and Practical Center of Neurology and Neurosurgery of the Ministry of Healthcare of Belarus and the Belarusian State University of Informatics and Radioelectronics. The results of testing revealed qualitative (based on the obtained spectrogram, kepstrogram, histogram) and quantitative (based on calculated parameters) differences between the parameters of speech signals in normal and bulbar syndrome. Preliminary results of the research confirmed the feasibility of using the methodological and hardware-software for registration and processing speech signals for diagnosis of neurological diseases.Предлагается методическое и аппаратно-программное обеспечение на основе цифровой обработки и анализа речевых сигналов для оперативной и объективной диагностики неврологических патологий, сопровождающихся нарушениями речевой функции. Апробация разработанных методических и программных средств была проведена на базе Республиканского научно-практического центра неврологии и нейрохирургии Министерства здравоохранения Республики Беларусь и Белорусского государственного университета информатики и радиоэлектроники. Результаты апробации выявили качественные (на основе полученных графиков спектрограмм, кепстрограмм, гистограмм) и количественные (на основе вычисленных параметров) различия между параметрами речевых сигналов в норме и при бульбарном синдроме. Предварительные результаты исследования подтвердили целесообразность использования разработанного авторами методического и аппаратно-программного обеспечения для регистрации и обработки речевых сигналов в диагностике неврологических заболеваний

    Análise acústica vocal - determinação do Jitter e Shimmer para diagnóstico de patalogias da fala

    Algumas técnicas de processamento digital de sinais têm sido usadas para analisar desordens vocais provocadas por patologias na laringe. Os exames laringoscópicos usados na detecção dessas patologias são técnicas invasivas que causam desconforto ao paciente. A Análise acústica vocal das características temporais e espectrais dos sinais de fala pode ser utilizada como técnica auxiliar à laringoscopia, tanto para pré-diagnósticos de patologias, bem como no acompanhamento de tratamentos farmacológicos e pós-cirúrgicos. Neste artigo, descrevem-se algumas patologias do aparelho fonador que provocam alterações na qualidade da fala produzida. Depois descrevem-se os algoritmos implementados para determinação da frequência fundamental (F0) pelo método Cepstral e pelo método da auto-correlação. Descreve-se também o algoritmo aqui desenvolvido e implementado para determinação do jitter e do shimmer pelos seus parâmetros de Jitta, Jitt, rap, ppq5, para o caso do jitter e Shim, ShdB, apq3 e apq5, para o caso do shimmer. Estes parâmetros de jitter e shimmer são muitas vezes indiciadores de patologias