slides

Multi-Dimensional Coding of Speech Data

Abstract

This paper presents specific new techniques for coding of speech representations and a new general approach to coding for compression, which directly utilises the multi-dimensional nature of the input data. Many methods of speech analysis yield a two-dimensional pattern, with time as one of the dimensions. Various such speech representations and power spectrum sequences in particular, are shown here to be amenable to two-dimensional compression using specific models which take account of a large part of their structure in both dimensions. Newly developed techniques, namely, Multi-step Adaptive Flux Interpolation ( MAFI) and Multi-step Flow Based Prediction (MFBP) are presented. These are able to code power spectral density (PSD) sequences of speech more completely and accurately than conventional methods and at a low computational cost. This is due to their ability to model non-stationary, piecewise-continuous, signals, of which speech is a good example. MAFI and MFBP are first applied in the time domain and then to the encoded data in the second dimension. This approach allows the coding algorithm to exploit redundancy in both dimensions, giving a significant movement in the overall compression ratio. Furthermore, the compression may be reapplied several times. The data is further compressed with each application

    Similar works