1,200 research outputs found
Perceptually-Driven Video Coding with the Daala Video Codec
The Daala project is a royalty-free video codec that attempts to compete with
the best patent-encumbered codecs. Part of our strategy is to replace core
tools of traditional video codecs with alternative approaches, many of them
designed to take perceptual aspects into account, rather than optimizing for
simple metrics like PSNR. This paper documents some of our experiences with
these tools, which ones worked and which did not. We evaluate which tools are
easy to integrate into a more traditional codec design, and show results in the
context of the codec being developed by the Alliance for Open Media.Comment: 19 pages, Proceedings of SPIE Workshop on Applications of Digital
Image Processing (ADIP), 201
A practical approach for the design of nonuniform lapped transforms
We propose a simple method for the design of lapped transforms with nonuniform frequency resolution and good time localization. The method is a generalization of an approach previously proposed by Princen, where the nonuniform filter bank is obtained by joining uniform cosine-modulated filter banks (CMFBs) using a transition filter. We use several transition filters to obtain a near perfect-reconstruction (PR) nonuniform lapped transform with significantly reduced overall distortion. The main advantage of the proposed method is in reducing the length of the transition filters, which leads to a reduction in processing delay that can be useful for applications such as real-time audio coding
Role of anticausal inverses in multirate filter-banks. I. System-theoretic fundamentals
In a maximally decimated filter bank with identical decimation ratios for all channels, the perfect reconstructibility property and the nature of reconstruction filters (causality, stability, FIR property, and so on) depend on the properties of the polyphase matrix. Various properties and capabilities of the filter bank depend on the properties of the polyphase matrix as well as the nature of its inverse. In this paper we undertake a study of the types of inverses and characterize them according to their system theoretic properties (i.e., properties of state-space descriptions, McMillan degree, degree of determinant, and so forth). We find in particular that causal polyphase matrices with anticausal inverses have an important role in filter bank theory. We study their properties both for the FIR and IIR cases. Techniques for implementing anticausal IIR inverses based on state space descriptions are outlined. It is found that causal FIR matrices with anticausal FIR inverses (cafacafi) have a key role in the characterization of FIR filter banks. In a companion paper, these results are applied for the factorization of biorthogonal FIR filter banks, and a generalization of the lapped orthogonal transform called the biorthogonal lapped transform (BOLT) developed
VHDL modeling and synthesis of the JPEG-XR inverse transform
This work presents a pipelined VHDL implementation of the inverse lapped biorthogonal transform used in the decompression process of the soon to be released JPEG-XR still image standard format. This inverse transform involves integer only calculations using lifting operations and Kronecker products. Divisions and multiplications by small integer coefficients are implemented using a bit shift and add technique resulting in a multiplier-less implementation with 736 instances of addition. When targeted to an Altera Stratix II FPGA with a 50 MHz system clock, this design is capable of completing the inverse transform of an 8400 x 6600 pixel image in less than 70 ms
Scalable and perceptual audio compression
This thesis deals with scalable perceptual audio compression. Two scalable perceptual solutions as well as a scalable to lossless solution are proposed and investigated. One of the scalable perceptual solutions is built around sinusoidal modelling of the audio signal whilst the other is built on a transform coding paradigm. The scalable coders are shown to scale both in a waveform matching manner as well as a psychoacoustic manner. In order to measure the psychoacoustic scalability of the systems investigated in this thesis, the similarity between the original signal\u27s psychoacoustic parameters and that of the synthesized signal are compared. The psychoacoustic parameters used are loudness, sharpness, tonahty and roughness. This analysis technique is a novel method used in this thesis and it allows an insight into the perceptual distortion that has been introduced by any coder analyzed in this manner
Compression methods for mechanical vibration signals: Application to the plane engines
International audienceA novel approach for the compression of mechanical vibration signals is presented in this paper. The method relies on a simple and flexible decomposition into a large number of subbands, implemented by an orthogonal transform. Compression is achieved by a uniform adaptive quantization of each subband. The method is tested on a large number of real vibration signals issued by plane engines. High compression ratios can be achieved, while keeping a good quality of the reconstructed signal. It is also shown that compression has little impact on the detection of some commonly encountered defects of the plane engine
Pipelined implementation of Jpeg image compression using Hdl
This thesis presents the architecture and design of a JPEG compressor for color images using VHDL. The system consists of major parts like color space converter, down sampler, 2-D DCT module, quantization, zigzag scanning and entropy coDing The color space conversion transforms the RGB colors to YCbCr color coDing The down sampling operation reduces the sampling rate of the color information (Cb and Cr). The 2-D DCT transform the pixel data from the spatial domain to the frequency domain. The quantization operation eliminates the high frequency components and the small amplitude coefficients of the co-sine expansion. Finally, the entropy coding uses run-length encoding (RLE), Huffman, variable length coding (VLC) and differential coding to decrease the number of bits used to represent the image. The JPEG compression is a lossy compression, since downsampling and quantization operations are irreversible. But the losses can be controlled in order to keep the necessary image quality; Architectures for these parts were designed and described in VHDL. The results were observed using Active-HDL simulator and the code being synthesized using xilinx ise for vertex-4 FPGA. This pipelined architecture has a minimum latency of 187 clock cycles
Transform/subband analysis and synthesis of signals
Includes bibliographical references (leaves 35-37).Work supported by the Advanced Television Research Program. Work supported by the National Science Foundation. MIP 87-14969David M. Baylon and Jae S. Lim
- …