669 research outputs found
Motion estimation and CABAC VLSI co-processors for real-time high-quality H.264/AVC video coding
Real-time and high-quality video coding is gaining a wide interest in the research and industrial community for different applications. H.264/AVC, a recent standard for high performance video coding, can be successfully exploited in several scenarios including digital video broadcasting, high-definition TV and DVD-based systems, which require to sustain up to tens of Mbits/s. To that purpose this paper proposes optimized architectures for H.264/AVC most critical tasks, Motion estimation and context adaptive binary arithmetic coding. Post synthesis results on sub-micron CMOS standard-cells technologies show that the proposed architectures can actually process in real-time 720 Ă— 480 video sequences at 30 frames/s and grant more than 50 Mbits/s. The achieved circuit complexity and power consumption budgets are suitable for their integration in complex VLSI multimedia systems based either on AHB bus centric on-chip communication system or on novel Network-on-Chip (NoC) infrastructures for MPSoC (Multi-Processor System on Chip
About Adaptive Coding on Countable Alphabets: Max-Stable Envelope Classes
In this paper, we study the problem of lossless universal source coding for
stationary memoryless sources on countably infinite alphabets. This task is
generally not achievable without restricting the class of sources over which
universality is desired. Building on our prior work, we propose natural
families of sources characterized by a common dominating envelope. We
particularly emphasize the notion of adaptivity, which is the ability to
perform as well as an oracle knowing the envelope, without actually knowing it.
This is closely related to the notion of hierarchical universal source coding,
but with the important difference that families of envelope classes are not
discretely indexed and not necessarily nested.
Our contribution is to extend the classes of envelopes over which adaptive
universal source coding is possible, namely by including max-stable
(heavy-tailed) envelopes which are excellent models in many applications, such
as natural language modeling. We derive a minimax lower bound on the redundancy
of any code on such envelope classes, including an oracle that knows the
envelope. We then propose a constructive code that does not use knowledge of
the envelope. The code is computationally efficient and is structured to use an
{E}xpanding {T}hreshold for {A}uto-{C}ensoring, and we therefore dub it the
\textsc{ETAC}-code. We prove that the \textsc{ETAC}-code achieves the lower
bound on the minimax redundancy within a factor logarithmic in the sequence
length, and can be therefore qualified as a near-adaptive code over families of
heavy-tailed envelopes. For finite and light-tailed envelopes the penalty is
even less, and the same code follows closely previous results that explicitly
made the light-tailed assumption. Our technical results are founded on methods
from regular variation theory and concentration of measure
LOCO-ANS: An Optimization of JPEG-LS Using an Efficient and Low-Complexity Coder Based on ANS
Near-lossless compression is a generalization of lossless compression, where the codec user is able to set the maximum absolute difference (the error tolerance) between the values of an original pixel and the decoded one. This enables higher compression ratios, while still allowing the control of the bounds of the quantization errors in the space domain. This feature makes them attractive for applications where a high degree of certainty is required. The JPEG-LS lossless and near-lossless image compression standard combines a good compression ratio with a low computational complexity, which makes it very suitable for scenarios with strong restrictions, common in embedded systems. However, our analysis shows great coding efficiency improvement potential, especially for lower entropy distributions, more common in near-lossless. In this work, we propose enhancements to the JPEG-LS standard, aimed at improving its coding efficiency at a low computational overhead, particularly for hardware implementations. The main contribution is a low complexity and efficient coder, based on Tabled Asymmetric Numeral Systems (tANS), well suited for a wide range of entropy sources and with simple hardware implementation. This coder enables further optimizations, resulting in great compression ratio improvements. When targeting photographic images, the proposed system is capable of achieving, in mean, 1.6%, 6%, and 37.6% better compression for error tolerances of 0, 1, and 10, respectively. Additional improvements are achieved increasing the context size and image tiling, obtaining 2.3% lower bpp for lossless compression. Our results also show that our proposal compares favorably against state-of-the-art codecs like JPEG-XL and WebP, particularly in near-lossless, where it achieves higher compression ratios with a faster coding speedThis work was supported in part by the Spanish Research Agency through the Project AgileMon under Grant AEI PID2019-104451RB-C2
Depth sequence coding with hierarchical partitioning and spatial-domain quantization
Depth coding in 3D-HEVC deforms object shapes due to block-level edge-approximation and lacks efficient techniques to exploit the statistical redundancy, due to the frame-level clustering tendency in depth data, for higher coding gain at near-lossless quality. This paper presents a standalone mono-view depth sequence coder, which preserves edges implicitly by limiting quantization to the spatial-domain and exploits the frame-level clustering tendency efficiently with a novel binary tree-based decomposition (BTBD) technique. The BTBD can exploit the statistical redundancy in frame-level syntax, motion components, and residuals efficiently with fewer block-level prediction/coding modes and simpler context modeling for context-adaptive arithmetic coding. Compared with the depth coder in 3D-HEVC, the proposed one has achieved significantly lower bitrate at lossless to near-lossless quality range for mono-view coding and rendered superior quality synthetic views from the depth maps, compressed at the same bitrate, and the corresponding texture frames. © 1991-2012 IEEE
Steered mixture-of-experts for light field images and video : representation and coding
Research in light field (LF) processing has heavily increased over the last decade. This is largely driven by the desire to achieve the same level of immersion and navigational freedom for camera-captured scenes as it is currently available for CGI content. Standardization organizations such as MPEG and JPEG continue to follow conventional coding paradigms in which viewpoints are discretely represented on 2-D regular grids. These grids are then further decorrelated through hybrid DPCM/transform techniques. However, these 2-D regular grids are less suited for high-dimensional data, such as LFs. We propose a novel coding framework for higher-dimensional image modalities, called Steered Mixture-of-Experts (SMoE). Coherent areas in the higher-dimensional space are represented by single higher-dimensional entities, called kernels. These kernels hold spatially localized information about light rays at any angle arriving at a certain region. The global model consists thus of a set of kernels which define a continuous approximation of the underlying plenoptic function. We introduce the theory of SMoE and illustrate its application for 2-D images, 4-D LF images, and 5-D LF video. We also propose an efficient coding strategy to convert the model parameters into a bitstream. Even without provisions for high-frequency information, the proposed method performs comparable to the state of the art for low-to-mid range bitrates with respect to subjective visual quality of 4-D LF images. In case of 5-D LF video, we observe superior decorrelation and coding performance with coding gains of a factor of 4x in bitrate for the same quality. At least equally important is the fact that our method inherently has desired functionality for LF rendering which is lacking in other state-of-the-art techniques: (1) full zero-delay random access, (2) light-weight pixel-parallel view reconstruction, and (3) intrinsic view interpolation and super-resolution
A Very low bit-rate speech recognition system
When using extracted speech feature coefficients for speech synthesis, quantization is considered a lossy compression scheme. The data being compressed cannot be recovered or reconstructed exactly. However, in a speech recognition system for command and control purposes, a certain amount of quantization can be allowed, with comparable results. In some cases, quantization even serves to close the gaps between the coefficients of the incoming speech signal and those of the templates. Since the coefficients are not being used to reconstruct the signal, a very coarse quantization can be used, enabling a very low bit-rate transmission with very good recognition results. To reduce the bandwidth further, a binary coding procedure, such as Huffman or Arithmetic Coding, can be applied to the quantized coefficients. Upon receipt of the transmission, the quantized coefficients are decoded and used to perform speech recognition. The sets of coefficients are compared to the templates for each of the commands in the vocabulary. Speech, however, is dynamic in nature and a dynamic recognition procedure is needed to allow for different vocal inflections and durations. A procedure called Dynamic Time Warping is used to warp the time axis of the templates to more closely fit the information coming in. By combining all these techniques, a very accurate, very low bit-rate recognizer has been developed and is discussed in this paper
Data compression techniques applied to high resolution high frame rate video technology
An investigation is presented of video data compression applied to microgravity space experiments using High Resolution High Frame Rate Video Technology (HHVT). An extensive survey of methods of video data compression, described in the open literature, was conducted. The survey examines compression methods employing digital computing. The results of the survey are presented. They include a description of each method and assessment of image degradation and video data parameters. An assessment is made of present and near term future technology for implementation of video data compression in high speed imaging system. Results of the assessment are discussed and summarized. The results of a study of a baseline HHVT video system, and approaches for implementation of video data compression, are presented. Case studies of three microgravity experiments are presented and specific compression techniques and implementations are recommended
Wavelet Based Image Coding Schemes : A Recent Survey
A variety of new and powerful algorithms have been developed for image
compression over the years. Among them the wavelet-based image compression
schemes have gained much popularity due to their overlapping nature which
reduces the blocking artifacts that are common phenomena in JPEG compression
and multiresolution character which leads to superior energy compaction with
high quality reconstructed images. This paper provides a detailed survey on
some of the popular wavelet coding techniques such as the Embedded Zerotree
Wavelet (EZW) coding, Set Partitioning in Hierarchical Tree (SPIHT) coding, the
Set Partitioned Embedded Block (SPECK) Coder, and the Embedded Block Coding
with Optimized Truncation (EBCOT) algorithm. Other wavelet-based coding
techniques like the Wavelet Difference Reduction (WDR) and the Adaptive Scanned
Wavelet Difference Reduction (ASWDR) algorithms, the Space Frequency
Quantization (SFQ) algorithm, the Embedded Predictive Wavelet Image Coder
(EPWIC), Compression with Reversible Embedded Wavelet (CREW), the Stack-Run
(SR) coding and the recent Geometric Wavelet (GW) coding are also discussed.
Based on the review, recommendations and discussions are presented for
algorithm development and implementation.Comment: 18 pages, 7 figures, journa
- …