7 research outputs found

    Variable Frame-Rate Speech Coding by Adaptive-Flux Interpolation

    Get PDF
    Variable frame-rate (VFR) speech coders have many desirable properties but make implicit assumptions concerning the nature of the spectral evolution of speech (Peeling and Pointing 1989). To date, these assumptions have been crude and unable to model speech parameters during extended periods of coarticulation. In particular they have been unable to cope with steadily changing formats. Thus, existing VFR methods must transmit many more frames than are really necessary. This paper presents a new technique; Adaptive-Flux Interpolation (AFI), which significantly extends the period over which accurate estimation can be performed and is much more robust and accurate than other methods

    Modelling the Flow Inherent in Speech Representations

    Get PDF
    This paper presents two new methods for modelling the flow inherent in speech: flow-based prediction (FBP)and acoustic flow interpolation (AFI). These are presented as extensions of the form of prediction implied in calculating the delta and delta-delta coefficients often used in automatic speech recognition.All these methods are presented as special cases of a general vector linear prediction model, but it is shown that the new techniques, which make the flow of features within the data explicit, are significantly better at modelling spectrogram-like data. Several speech representations, using both parametric and non-parametric analyses, are discussed both in terms of their ability to represent speech accurately and of their appropriateness to these flow-based models. AFI and FBP error coefficients, for both male and female speakers, are measured and compared with the delta and delta-delta coefficients. Wherever possible, the parameters and methods used to produce the representations have been chosen to be directly comparable with one another

    Multi-Dimensional Coding of Speech Data

    Get PDF
    This paper presents specific new techniques for coding of speech representations and a new general approach to coding for compression, which directly utilises the multi-dimensional nature of the input data. Many methods of speech analysis yield a two-dimensional pattern, with time as one of the dimensions. Various such speech representations and power spectrum sequences in particular, are shown here to be amenable to two-dimensional compression using specific models which take account of a large part of their structure in both dimensions. Newly developed techniques, namely, Multi-step Adaptive Flux Interpolation ( MAFI) and Multi-step Flow Based Prediction (MFBP) are presented. These are able to code power spectral density (PSD) sequences of speech more completely and accurately than conventional methods and at a low computational cost. This is due to their ability to model non-stationary, piecewise-continuous, signals, of which speech is a good example. MAFI and MFBP are first applied in the time domain and then to the encoded data in the second dimension. This approach allows the coding algorithm to exploit redundancy in both dimensions, giving a significant movement in the overall compression ratio. Furthermore, the compression may be reapplied several times. The data is further compressed with each application

    Image Coding by Multi-Step, Adaptive Flux Interpolation

    Get PDF
    This paper describes and discusses a new technique, the multi-step adaptive flux interpolation (MAFI) and its application to image data for coding. The output of MAFI, when applied to an image, is still in an image form but has a more uniform feature density. This is because the original image has been warped by removing those rows and columns which contain mostly redundant pixels. It is also greatly reduced in size and the side information is minimal. The MAFI output can be further compressed using conventional coders, making its compression ratio even higher. Because of its warped nature, the MAFI output's statistics are also more consistent with the properties assumed by block-based discrete cosine transform (DCT) models

    Digital processing of speech produced in hyperbaric helium

    No full text
    SIGLEAvailable from British Library Document Supply Centre- DSC:D66980/86 / BLDSC - British Library Document Supply CentreGBUnited Kingdo

    Image coding by multistep, adaptive flux interpolation

    No full text
    corecore