Search CORE

36 research outputs found

Harmonic to noise ratio measurement - selection of window and length

Author: Candido Junior Arnaldo
Fernandes Joana Filipa
Guedes Victor
Teixeira Felipe L.
Teixeira João Paulo
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

Harmonic to Noise Ratio (HNR) measures the ratio between periodic and non-periodic components of a speech sound. It has become more and more important in the vocal acoustic analysis to diagnose pathologic voices. The measure of this parameter can be done with Praat software that is commonly accept by the scientific community has an accurate measure. Anyhow, this measure is dependent with the type of window used and its length. In this paper an analysis of the influence of the window and its length was made. The Hanning, Hamming and Blackman windows and the lengths between 6 and 24 glottal periods were experimented. Speech files of control subjects and pathologic subjects were used. The results showed that the Hanning window with the length of 12 glottal periods gives measures of HNR more close to the Praat measures.info:eu-repo/semantics/publishedVersio

Biblioteca Digital do IPB

Pragmatic functions of lengthenings and filled pauses in the adult-directed speech of Hungarian children

Author: Deme Andrea
Publication venue: Samfundslitteratur
Publication date: 01/01/2013
Field of study

Two most common disfluencies of spontaneous speech, vowel lengthenings (VLE) and non-lexicalized filled pauses (NLFP) were investigated in the adult-directed speech of eight Hungarian children. Though VLE and NLFP might seem to be similar vocalizations, recent investigations have shown that their occurrences might differ remarkably in child speech and may al-so change as a function of age. Based on these findings, in the present study the functional analysis of VLEs and NLFPs was performed. It was hypothesized that in child speech the two phenomena have roles not only in speech planning, but also in discourse management, and that they show functional distribution. The analysis provided evidence that VLE is more common than NLFP. VLE often tends to mark discourse events and may play a role in turn-final floor-holding strategies, while NLFP is mostly connected to speech planning, and occasionally, it may also participate in turn-taking gestures, as well

Repository of the Academy's Library

Improved Algorithm for Pathological and Normal Voices Identification

Author: Khazri Yassine
Moussetad Mohamed
Rouda Fatima
Sabir Brahim
Touri Bouzekri
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/02/2017
Field of study

There are a lot of papers on automatic classification between normal and pathological voices, but they have the lack in the degree of severity estimation of the identified voice disorders. Building a model of pathological and normal voices identification, that can also evaluate the degree of severity of the identified voice disorders among students. In the present work, we present an automatic classifier using acoustical measurements on registered sustained vowels /a/ and pattern recognition tools based on neural networks. The training set was done by classifying students’ recorded voices based on threshold from the literature. We retrieve the pitch, jitter, shimmer and harmonic-to-noise ratio values of the speech utterance /a/, which constitute the input vector of the neural network. The degree of severity is estimated to evaluate how the parameters are far from the standard values based on the percent of normal and pathological values. In this work, the base data used for testing the proposed algorithm of the neural network is formed by healthy and pathological voices from German database of voice disorders. The performance of the proposed algorithm is evaluated in a term of the accuracy (97.9%), sensitivity (1.6%), and specificity (95.1%). The classification rate is 90% for normal class and 95% for pathological class

IAES journal

Crossref

ZENODO

Institute of Advanced Engineering and Science

Phonetic characteristics of filled pauses: the effects of speakers’ age

Author: Beke András
Bóna Judit
Gósy Mária
Horváth Viktória
Publication venue
Publication date: 01/01/2014
Field of study

Filled pauses usually reveal speech planning or execution problems even though the speaker does not produce an overt error and may have a function of discourse marker as well. In Hungarian, the most frequent form of filled pauses is a schwa-like vowel of various durations. The purpose of this study was to analyze the occurrence, duration and formant structure of Hungarian schwa-like filled pauses in 16 nine-year-old children, in 16 young adults and in 16 elderly speakers. Our hypotheses were that filled pauses (i) would be more frequent in elderly than in other two age groups, (ii) would show similar durations in all age groups, and (iii) would show different formant structures depending on age. Results confirmed age-dependent occurrences, durations and formants of filled pauses; the hypotheses were partly verified. Speakers’ age is one of those factors that influence the occurrences and formant values of filled pauses

Repository of the Academy's Library

Phoneme-retrieval; voice recognition; vowels recognition

Author: Lecian Orchidea Maria
Tirozzi Brunello
Publication venue
Publication date: 10/07/2023
Field of study

A phoneme-retrieval technique is proposed, which is due to the particular way of the construction of the network. An initial set of neurons is given. The number of these neurons is approximately equal to the number of typical structures of the data. For example if the network is built for voice retrieval then the number of neurons must be equal to the number of characteristic phonemes of the alphabet of the language spoken by the social group to which the particular person belongs. Usually this task is very complicated and the network can depend critically on the samples used for the learning. If the network is built for image retrieval then it works only if the data to be retrieved belong to a particular set of images. If the network is built for voice recognition it works only for some particular set of words. A typical example is the words used for the flight of airplanes. For example a command like the "airplane should make a turn of 120 degrees towards the east" can be easily recognized by the network if a suitable learning procedure is used.Comment: 10 page

arXiv.org e-Print Archive

Robust ASR using Support Vector Machines

Author: A. Gallardo-Antolín
Allwein
Bengio
Bourlard
Burges
C. Peláez-Moreno
Clarkson
Crammer
D. Martín-Iglesias
F. Díaz-de-María
Fürnkranz
Ganapathiraju
Glass
Hsu
Jiang
Joachims
Navia-Vázquez
R. Solera-Ureña
Rabiner
Schölkopf
Shimodaira
Thubthong
Trentin
Vapnik
Vapnik
Vicente-Peña
Weiss
Wu
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

The improved theoretical properties of Support Vector Machines with respect to other machine learning alternatives due to their max-margin training paradigm have led us to suggest them as a good technique for robust speech recognition. However, important shortcomings have had to be circumvented, the most important being the normalisation of the time duration of different realisations of the acoustic speech units. In this paper, we have compared two approaches in noisy environments: first, a hybrid HMM–SVM solution where a fixed number of frames is selected by means of an HMM segmentation and second, a normalisation kernel called Dynamic Time Alignment Kernel (DTAK) first introduced in Shimodaira et al. [Shimodaira, H., Noma, K., Nakai, M., Sagayama, S., 2001. Support vector machine with dynamic time-alignment kernel for speech recognition. In: Proc. Eurospeech, Aalborg, Denmark, pp. 1841–1844] and based on DTW (Dynamic Time Warping). Special attention has been paid to the adaptation of both alternatives to noisy environments, comparing two types of parameterisations and performing suitable feature normalisation operations. The results show that the DTA Kernel provides important advantages over the baseline HMM system in medium to bad noise conditions, also outperforming the results of the hybrid system.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Universidad Carlos III de Madrid e-Archivo

Методическое и аппаратно-программное обеспечение для регистрации и обработки речевых сигналов с целью диагностики неврологических заболеваний

Author: A. N. Osipov
I. V. Rushkevich
M. M. Mezhennaya
S. A. Likhachev
T. P. Kul
Yu. N. Rushkevich
А. Н. Осипов
И. В. Рушкевич
М. М. Меженная
С. А. Лихачев
Т. П. Куль
Ю. Н. Рушкевич
Publication venue: UIIP NASB
Publication date: 11/01/2019
Field of study

The methodical and hardware-software on the basis of processing and analysis of speech signals for operative and objective diagnostics of neurological pathologies accompanied by speech function disorders are offered. The testing of developed methodological and software tools was carried out on the basis of the Republican Scientific and Practical Center of Neurology and Neurosurgery of the Ministry of Healthcare of Belarus and the Belarusian State University of Informatics and Radioelectronics. The results of testing revealed qualitative (based on the obtained spectrogram, kepstrogram, histogram) and quantitative (based on calculated parameters) differences between the parameters of speech signals in normal and bulbar syndrome. Preliminary results of the research confirmed the feasibility of using the methodological and hardware-software for registration and processing speech signals for diagnosis of neurological diseases.Предлагается методическое и аппаратно-программное обеспечение на основе цифровой обработки и анализа речевых сигналов для оперативной и объективной диагностики неврологических патологий, сопровождающихся нарушениями речевой функции. Апробация разработанных методических и программных средств была проведена на базе Республиканского научно-практического центра неврологии и нейрохирургии Министерства здравоохранения Республики Беларусь и Белорусского государственного университета информатики и радиоэлектроники. Результаты апробации выявили качественные (на основе полученных графиков спектрограмм, кепстрограмм, гистограмм) и количественные (на основе вычисленных параметров) различия между параметрами речевых сигналов в норме и при бульбарном синдроме. Предварительные результаты исследования подтвердили целесообразность использования разработанного авторами методического и аппаратно-программного обеспечения для регистрации и обработки речевых сигналов в диагностике неврологических заболеваний

Informatics (E-Journal) / Информатика

Методическое и аппаратно-программное обеспечение для регистрации и обработки речевых сигналов с целью диагностики неврологических заболеваний

Author: Mezhennaya M. M.
Osipov А. N.
Куль Т. П.
Лихачев С. А.
Меженная М. М.
Осипов А. Н.
Рушкевич И. В.
Рушкевич Ю. Н.
Publication venue: ОИПИ НАН Беларуси, РБ
Publication date: 01/01/2019
Field of study

Предлагается методическое и аппаратно-программное обеспечение на основе цифровой обработки и анализа речевых сигналов для оперативной и объективной диагностики неврологических патологий, сопровождающихся нарушениями речевой функции. Апробация разработанных методических и программных средств была проведена на базе Республиканского научно-практического центра неврологии и нейрохирургии Министерства здравоохранения Республики Беларусь и Белорусского государственного университета информатики и радиоэлектроники. Результаты апробации выявили качественные (на основе полученных графиков спектрограмм, кепстрограмм, гистограмм) и количественные (на основе вычисленных параметров) различия между параметрами речевых сигналов в норме и при бульбарном синдроме. Предварительные результаты исследования подтвердили целесообразность использования разработанного авторами методического и аппаратно-программного обеспечения для регистрации и обработки речевых сигналов в диагностике неврологических заболеваний

Belarusian State University of Informatics and Radioelectronics Repository

Análise acústica vocal - determinação do Jitter e Shimmer para diagnóstico de patalogias da fala

Author: Carneiro Susana Moreira
Ferreira Débora
Teixeira João Paulo
Publication venue: INEGI
Publication date: 01/01/2011
Field of study

Algumas técnicas de processamento digital de sinais têm sido usadas para analisar desordens vocais provocadas por patologias na laringe. Os exames laringoscópicos usados na detecção dessas patologias são técnicas invasivas que causam desconforto ao paciente. A Análise acústica vocal das características temporais e espectrais dos sinais de fala pode ser utilizada como técnica auxiliar à laringoscopia, tanto para pré-diagnósticos de patologias, bem como no acompanhamento de tratamentos farmacológicos e pós-cirúrgicos. Neste artigo, descrevem-se algumas patologias do aparelho fonador que provocam alterações na qualidade da fala produzida. Depois descrevem-se os algoritmos implementados para determinação da frequência fundamental (F0) pelo método Cepstral e pelo método da auto-correlação. Descreve-se também o algoritmo aqui desenvolvido e implementado para determinação do jitter e do shimmer pelos seus parâmetros de Jitta, Jitt, rap, ppq5, para o caso do jitter e Shim, ShdB, apq3 e apq5, para o caso do shimmer. Estes parâmetros de jitter e shimmer são muitas vezes indiciadores de patologias

Biblioteca Digital do IPB