7 research outputs found

    Phase-Distortion-Robust Voice-Source Analysis

    Get PDF
    This work concerns itself with the analysis of voiced speech signals, in particular the analysis of the glottal source signal. Following the source-filter theory of speech, the glottal signal is produced by the vibratory behaviour of the vocal folds and is modulated by the resonances of the vocal tract and radiation characteristic of the lips to form the speech signal. As it is thought that the glottal source signal contributes much of the non-linguistic and prosodical information to speech, it is useful to develop techniques which can estimate and parameterise this signal accurately. Because of vocal tract modulation, estimating the glottal source waveform from the speech signal is a blind deconvolution problem which necessarily makes assumptions about the characteristics of both the glottal source and vocal tract. A common assumption is that the glottal signal and/or vocal tract can be approximated by a parametric model. Other assumptions include the causality of the speech signal: the vocal tract is assumed to be a minimum phase system while the glottal source is assumed to exhibit mixed phase characteristics. However, as the literature review within this thesis will show, the error criteria utilised to determine the parameters are not robust to the conditions under which the speech signal is recorded, and are particularly degraded in the common scenario where low frequency phase distortion is introduced. Those that are robust to this type of distortion are not well suited to the analysis of real-world signals. This research proposes a voice-source estimation and parameterisation technique, called the Power-spectrum-based determination of the Rd parameter (PowRd) method. Illustrated by theory and demonstrated by experiment, the new technique is robust to the time placement of the analysis frame and phase issues that are generally encountered during recording. The method assumes that the derivative glottal flow signal is approximated by the transformed Liljencrants-Fant model and that the vocal tract can be represented by an all-pole filter. Unlike many existing glottal source estimation methods, the PowRd method employs a new error criterion to optimise the parameters which is also suitable to determine the optimal vocal-tract filter order. In addition to the issue of glottal source parameterisation, nonlinear phase recording conditions can also adversely affect the results of other speech processing tasks such as the estimation of the instant of glottal closure. In this thesis, a new glottal closing instant estimation algorithm is proposed which incorporates elements from the state-of-the-art techniques and is specifically designed for operation upon speech recorded under nonlinear phase conditions. The new method, called the Fundamental RESidual Search or FRESS algorithm, is shown to estimate the glottal closing instant of voiced speech with superior precision and comparable accuracy as other existing methods over a large database of real speech signals under real and simulated recording conditions. An application of the proposed glottal source parameterisation method and glottal closing instant detection algorithm is a system which can analyse and re-synthesise voiced speech signals. This thesis describes perceptual experiments which show that, iunder linear and nonlinear recording conditions, the system produces synthetic speech which is generally preferred to speech synthesised based upon a state-of-the-art timedomain- based parameterisation technique. In sum, this work represents a movement towards flexible and robust voice-source analysis, with potential for a wide range of applications including speech analysis, modification and synthesis

    Linear Prediction: The Problem, its Solution and Application to Speech

    Get PDF
    Linear prediction is a signal processing technique that is used extensively in the analysis of speech signals and, as it is so heavily referred to in speech processing literature, a certain level of familiarity with the topic is typically required by all speech processing engineers. This paper aims to provide a well-rounded introduction to linear prediction, and so doing, facilitate the understanding of the technique. Linear prediction and its mathematical derivation will be described, with a specific focus on applying the technique to speech signals. It is noted, however, that although progress in linear prediction has been driven primarily by speech research, it involves concepts that prove useful to digital signal processing in general

    Introducing PVSPITCH: a Pitch Tracking Opcode for Csound

    Get PDF
    An accurate pitch tracker has many useful applications, whether for creating interactive electroacoustic compositions, music transcription, ethnomusicological research and numerous others. Designed for the Csound sound synthesis and signal processing language, PVSPITCH is an opcode that can be utilised for such purposes. The opcode performs a mathemathical analysis upon Csound\u27s phase vocoder data streams and from this examination, ascertains what it determines to be the signal\u27s pitch. The algorithm handles well many types of signals, including those missing various harmonics, even a fundamental, and also signals with inharmonic partials. The only signal restriction are that it be single voiced, strongly pitched and slowly changing. This paper introduces the opcode and illustrates PVSPITCH\u27s pitch determination algorithm. Schouten\u27s pitch determination hypothesis and the concept of tonal fusion are briefly discussed as the background to the opcode\u27s development. Three case studies are also explored to demonstrate the accuracy of the algorithm

    A Brief Introduction to Speech Synthesis and Voice Modification

    Get PDF
    For both engineers and linguists, the computer synthesis of natural speech is an objective that would provide many useful applications to human-computer interaction, including the realm of electro-acoustic music. The purpose of this paper is to introduce the area of speech synthesis by providing an overview of the three main methods of computer speech synthesis; namely concatenative, articulatory and formant syntheses. Some aspects of the current state of the technology are illuminated and the final section will explain the author’s motivation and current research approach to the field of voice modification

    On the Appearance of a Positive Real Pole in the Results of Glottal Closed Phase Linear Prediction

    Get PDF
    Often when performing glottal closed phase covariance linear prediction, a positive real pole can appear in the resulting filter transfer function. The commonly adopted approach is to discard this pole, as it does not fit with the usual model of the all-pole vocal tract filter. However, this real pole describes some aspect of the speech signal; this paper provides a novel perspective on its occurrence. This viewpoint has a useful implication to the speech community, especially from the perspective of fitting a glottal pulse to the inverse filtered signal, as the real pole describes the return phase of the glottal flow for certain voice types that adhere to a reasonable criterion. Tests with synthetic signals are performed to validate this approach

    Towards a Method to Determine the Glottal Formant Parameters of Voiced Speech without Time-Domain Reference

    Get PDF
    This paper presents an approach to estimate the glottal formant parameters of the voicing source in the frequency-domain. The method is based on a simplified pole-zero interpretion of the prevalent Liljencrants-Fant (LF) model of glottal flow, and gives approximations for a broad range of pulses shapes. An advantage of the method is that, unlike other methods, it does not rely on time-domain references

    Exploiting Glottal Formant Parameters for Glottal Inverse Filtering and Parameterization

    Get PDF
    It is crucial for many methods of inverse filtering that the time domain information of the glottal source waveform is known, e.g. the location of the instant of glottal closure. It is often the case that this information is unknown and/or cannot be determined due to e.g. recording conditions which can corrupt the phase spectrum. In these scenarios, alternative strategies are required. This paper describes a method which, given the parameters of the glottal formant of the signal frame, can accurately parameterize the glottal shape source and vocal filter for a broad range of voice quality types and which is robust to the corruption of the phase spectrum. Index Terms: glottal inverse filtering, frequency domain, glottal models, glottal forman
    corecore