36 research outputs found

    Työvälineet äänilähteen analyysiin: päivitetty Aalto Aparat ja jatkuvan puheen sekä samanaikaisen elektroglottorafisignaalin tietokanta

    Get PDF
    This thesis presents two tools for voice source analysis: updated Aalto Aparat inverse filtering programme, and a database of continuous Finnish speech and simultaneous electroglottography (EGG). A new glottal inverse filtering method, quasi closed phase glottal inverse filtering (QCP) has been implemented to Aalto Aparat, and usability of the programme has been improved. The results of the computations can now be transferred to other analysis programmes more efficiently. Also, a comprehensive manual of Aparat has been compiled. The database of continuous speech and EGG contains 20 recitations of a Finnish text by 10 male and 10 female native Finnish speakers. The recitations were recorded with a headset condense microphone and EGG electrodes. The recording sessions were performed in an anechoic chamber, and the full database contains almost an hour of material. The data can be used e.g. when evaluating new GIF methods.Tässä työssä esitetään kaksi työvälinettä äänilähteen mallintamiseen: päivitetty äänilähteen käänteissuodatusohjelma Aalto Aparat, sekä tietokanta jatkuvasta suomenkielisestä puheesta yhdessä elektroglottografisen (EGG) signaalin kanssa. Aalto Aparatiin lisättiin päivityksen yhteydessä yksi uusi käänteissuodatusmenetelmä, quasi closed phase inverse filtering (QCP), ja ohjelman käytettävyyttä parannettiin lisäämässä tuloksien tallennusvaihtoehtoja. Suodatustuloksia voi nyt siirtää entistä helpommin muihin analyysiohjelmiin. Lisäksi laadittiin kattava ohjekirja ohjelman käytöstä. Jatkuvan puheen ja EGG signaalin tietokanta sisältää 20 nauhoitetta, joissa lyhyt suomenkielinen tekstinäyte on luettu ääneen. Lukijoina oli 10 mies- ja 10 naispuolista suomenkielistä puhujaa. Ääneenluvut tallennettiin pantamikrofonin ja EGG elektrodien avulla. Äänitykset tehtiin kaiuttomassa huoneessa, ja kokonaisuudessaan tietokanta sisältää noin tunnin verran materiaalia, jota voidaan käyttää mm. uusien äänilähteen käänteissuodatusmenetelmien arvioimiseen

    Phase-Distortion-Robust Voice-Source Analysis

    Get PDF
    This work concerns itself with the analysis of voiced speech signals, in particular the analysis of the glottal source signal. Following the source-filter theory of speech, the glottal signal is produced by the vibratory behaviour of the vocal folds and is modulated by the resonances of the vocal tract and radiation characteristic of the lips to form the speech signal. As it is thought that the glottal source signal contributes much of the non-linguistic and prosodical information to speech, it is useful to develop techniques which can estimate and parameterise this signal accurately. Because of vocal tract modulation, estimating the glottal source waveform from the speech signal is a blind deconvolution problem which necessarily makes assumptions about the characteristics of both the glottal source and vocal tract. A common assumption is that the glottal signal and/or vocal tract can be approximated by a parametric model. Other assumptions include the causality of the speech signal: the vocal tract is assumed to be a minimum phase system while the glottal source is assumed to exhibit mixed phase characteristics. However, as the literature review within this thesis will show, the error criteria utilised to determine the parameters are not robust to the conditions under which the speech signal is recorded, and are particularly degraded in the common scenario where low frequency phase distortion is introduced. Those that are robust to this type of distortion are not well suited to the analysis of real-world signals. This research proposes a voice-source estimation and parameterisation technique, called the Power-spectrum-based determination of the Rd parameter (PowRd) method. Illustrated by theory and demonstrated by experiment, the new technique is robust to the time placement of the analysis frame and phase issues that are generally encountered during recording. The method assumes that the derivative glottal flow signal is approximated by the transformed Liljencrants-Fant model and that the vocal tract can be represented by an all-pole filter. Unlike many existing glottal source estimation methods, the PowRd method employs a new error criterion to optimise the parameters which is also suitable to determine the optimal vocal-tract filter order. In addition to the issue of glottal source parameterisation, nonlinear phase recording conditions can also adversely affect the results of other speech processing tasks such as the estimation of the instant of glottal closure. In this thesis, a new glottal closing instant estimation algorithm is proposed which incorporates elements from the state-of-the-art techniques and is specifically designed for operation upon speech recorded under nonlinear phase conditions. The new method, called the Fundamental RESidual Search or FRESS algorithm, is shown to estimate the glottal closing instant of voiced speech with superior precision and comparable accuracy as other existing methods over a large database of real speech signals under real and simulated recording conditions. An application of the proposed glottal source parameterisation method and glottal closing instant detection algorithm is a system which can analyse and re-synthesise voiced speech signals. This thesis describes perceptual experiments which show that, iunder linear and nonlinear recording conditions, the system produces synthetic speech which is generally preferred to speech synthesised based upon a state-of-the-art timedomain- based parameterisation technique. In sum, this work represents a movement towards flexible and robust voice-source analysis, with potential for a wide range of applications including speech analysis, modification and synthesis

    Modal Locking Between Vocal Fold Oscillations and Vocal Tract Acoustics

    Get PDF
    During voiced speech, vocal folds interact with the vocal tract acoustics. The resulting glottal source-resonator coupling has been observed using mathematical and physical models as well as in in vivo phonation. We propose a computational time-domain model of the full speech apparatus that contains a feedback mechanism from the vocal tract acoustics to the vocal fold oscillations. It is based on numerical solution of ordinary and partial differential equations defined on vocal tract geometries that have been obtained by magnetic resonance imaging. The model is used to simulate rising and falling pitch glides of [alpha, i] in the fundamental frequency (f(o)) interval [145 Hz, 315 Hz]. The interval contains the first vocal tract resonance f(R1) and the first formant F-1 of [i] as well as the fractions of the first resonance f(R1)/5, f(R1)/4, and f(R1)/3 of [alpha]. The glide simulations reveal a locking pattern in the f(o) trajectory approximately at f(R1) of [i]. The resonance fractions of [alpha] produce perturbations in the pressure signal at the lips but no locking.Peer reviewe

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies

    New linear predictive methods for digital speech processing

    Get PDF
    Speech processing is needed whenever speech is to be compressed, synthesised or recognised by the means of electrical equipment. Different types of phones, multimedia equipment and interfaces to various electronic devices, all require digital speech processing. As an example, a GSM phone applies speech processing in its RPE-LTP encoder/decoder (ETSI, 1997). In this coder, 20 ms of speech is first analysed in the short-term prediction (STP) part, and second in the long-term prediction (LTP) part. Finally, speech compression is achieved in the RPE encoding part, where only 1/3 of the encoded samples are selected to be transmitted. This thesis presents modifications for one of the most widely applied techniques in digital speech processing, namely linear prediction (LP). During recent decades linear prediction has played an important role in telecommunications and other areas related to speech compression and recognition. In linear prediction sample s(n) is predicted from its p previous samples by forming a linear combination of the p previous samples and by minimising the prediction error. This procedure in the time domain corresponds to modelling the spectral envelope of the speech spectrum in the frequency domain. The accuracy of the spectral envelope to the speech spectrum is strongly dependent on the order of the resulting all-pole filter. This, in turn, is usually related to the number of parameters required to define the model, and hence to be transmitted. Our study presents new predictive methods, which are modified from conventional linear prediction by taking the previous samples for linear combination differently. This algorithmic development aims at new all-pole techniques, which could present speech spectra with fewer parameters.reviewe

    No-estacionariedad, multifractalidad y limpieza de ruido en señales reales

    Get PDF
    Las señales biomédicas, como el electrocardiograma, el electroencefalograma, o la señal de voz, tienen en común características de no estacionariedad y no linealidad. Aunque enmuchas aplicaciones se considera que se trata de señales estacionarias procedentes de sistemas lineales, ésta simplificación constituye una hipótesis de trabajo válida sólo como una aproximación que permite la aplicación de técnicas clásicas deanálisis de señales. Muchos trastornos que afectan a uno o varios órganos pueden ser detectados a través de un correcto análisis de las señales en cuya producción están involucrados. Sin embargo, debe atenderse al hecho de que una señal procedente de un sistema patológico se aleja aún más de las condiciones hipotéticas de estacionariedad y linealidad. Se desprende de esta circunstancia la necesidad de abordar el análisis de las señales biomédicas mediante técnicas no convencionales que permitan su tratamiento en un marco que tenga en cuenta sus características de no estacionariedad y no linealidad. Sobre la base de la experiencia del grupo de trabajo en las áreas del análisis tiempo-frecuencia/escala, análisis y modelado estadístico, análisis multifractal, complejidad y métodos guiados por los datos (adaptativos), a partir de problemas reales se han propuesto y estudiado nuevas técnicas que posibiliten su solución

    Speech Modeling and Robust Estimation for Diagnosis of Parkinson’s Disease

    Get PDF
    corecore