39,593 research outputs found
Multi-Dimensional Coding of Speech Data
This paper presents specific new techniques for coding of speech representations and a new general approach to coding for compression, which directly utilises the multi-dimensional nature of the input data. Many methods of speech analysis yield a two-dimensional pattern, with time as one of the dimensions. Various such speech representations and power spectrum sequences in particular, are shown here to be amenable to two-dimensional compression using specific models which take account of a large part of their structure in both dimensions.
Newly developed techniques, namely, Multi-step Adaptive Flux Interpolation ( MAFI) and Multi-step Flow Based Prediction (MFBP) are presented. These are able to code power spectral density (PSD) sequences of speech more completely and accurately than conventional methods and at a low computational cost. This is due to their ability to model non-stationary, piecewise-continuous, signals, of which speech is a good example.
MAFI and MFBP are first applied in the time domain and then to the encoded data in the second dimension. This approach allows the coding algorithm to exploit redundancy in both dimensions, giving a significant movement in the overall compression ratio. Furthermore, the compression may be reapplied several times. The data is further compressed with each application
Optimal modeling for complex system design
The article begins with a brief introduction to the theory describing optimal data compression systems and their performance. A brief outline is then given of a representative algorithm that employs these lessons for optimal data compression system design. The implications of rate-distortion theory for practical data compression system design is then described, followed by a description of the tensions between theoretical optimality and system practicality and a discussion of common tools used in current algorithms to resolve these tensions. Next, the generalization of rate-distortion principles to the design of optimal collections of models is presented. The discussion focuses initially on data compression systems, but later widens to describe how rate-distortion theory principles generalize to model design for a wide variety of modeling applications. The article ends with a discussion of the performance benefits to be achieved using the multiple-model design algorithms
Semilogarithmic Nonuniform Vector Quantization of Two-Dimensional Laplacean Source for Small Variance Dynamics
In this paper high dynamic range nonuniform two-dimensional vector quantization model for Laplacean source was provided. Semilogarithmic A-law compression characteristic was used as radial scalar compression characteristic of two-dimensional vector quantization. Optimal number value of concentric quantization domains (amplitude levels) is expressed in the function of parameter A. Exact distortion analysis with obtained closed form expressions is provided. It has been shown that proposed model provides high SQNR values in wide range of variances, and overachieves quality obtained by scalar A-law quantization at same bit rate, so it can be used in various switching and adaptation implementations for realization of high quality signal compression
Weighted universal image compression
We describe a general coding strategy leading to a family of universal image compression systems designed to give good performance in applications where the statistics of the source to be compressed are not available at design time or vary over time or space. The basic approach considered uses a two-stage structure in which the single source code of traditional image compression systems is replaced with a family of codes designed to cover a large class of possible sources. To illustrate this approach, we consider the optimal design and use of two-stage codes containing collections of vector quantizers (weighted universal vector quantization), bit allocations for JPEG-style coding (weighted universal bit allocation), and transform codes (weighted universal transform coding). Further, we demonstrate the benefits to be gained from the inclusion of perceptual distortion measures and optimal parsing. The strategy yields two-stage codes that significantly outperform their single-stage predecessors. On a sequence of medical images, weighted universal vector quantization outperforms entropy coded vector quantization by over 9 dB. On the same data sequence, weighted universal bit allocation outperforms a JPEG-style code by over 2.5 dB. On a collection of mixed test and image data, weighted universal transform coding outperforms a single, data-optimized transform code (which gives performance almost identical to that of JPEG) by over 6 dB
Fast algorithm for the 3-D DCT-II
Recently, many applications for three-dimensional
(3-D) image and video compression have been proposed using 3-D discrete cosine transforms (3-D DCTs). Among different types of DCTs, the type-II DCT (DCT-II) is the most used. In order to use the 3-D DCTs in practical applications, fast 3-D algorithms are essential. Therefore, in this paper, the 3-D vector-radix decimation-in-frequency (3-D VR DIF) algorithm that calculates the 3-D DCT-II directly is introduced. The mathematical analysis and the implementation of the developed algorithm are presented,
showing that this algorithm possesses a regular structure, can be implemented in-place for efficient use of memory, and is faster than the conventional row-column-frame (RCF) approach. Furthermore, an application of 3-D video compression-based 3-D DCT-II is implemented using the 3-D new algorithm. This has led to a substantial speed improvement for 3-D DCT-II-based compression systems and proved the validity of the developed algorithm
On the Compression of Recurrent Neural Networks with an Application to LVCSR acoustic modeling for Embedded Speech Recognition
We study the problem of compressing recurrent neural networks (RNNs). In
particular, we focus on the compression of RNN acoustic models, which are
motivated by the goal of building compact and accurate speech recognition
systems which can be run efficiently on mobile devices. In this work, we
present a technique for general recurrent model compression that jointly
compresses both recurrent and non-recurrent inter-layer weight matrices. We
find that the proposed technique allows us to reduce the size of our Long
Short-Term Memory (LSTM) acoustic model to a third of its original size with
negligible loss in accuracy.Comment: Accepted in ICASSP 201
- …