Search CORE

326 research outputs found

An efficient hardware architecture for H.264 adaptive deblocking filter algorithm

Author: Hamzaoglu Ilker
Hamzaoğlu İlker
Parlak Mustafa
Publication venue: IEEE Computer Society
Publication date: 01/01/2006
Field of study

This paper presents an efficient hardware architecture for real-time implementation of adaptive deblocking filter algorithm used in H.264 video coding standard. This hardware is designed to be used as part of a complete H.264 video coding system for portable applications. We use a novel edge filter ordering in a Macroblock to prevent the deblocking filter hardware from unnecessarily waiting for the pixels that will be filtered become available. The proposed architecture is implemented in Verilog HDL. The Verilog RTL code is verified to work at 72 MHz in a Xilinx Virtex II FPGA. The FPGA implementation can code 30 CIF frames (352x288) per second

CiteSeerX

Sabanci University Research Database

Parallel deblocking filtering in MPEG-4 AVC/H.264 on massively parallel architectures

Author: De Cock Jan
De Neve Wesley
Hollemeersch Charles
Lambert Peter
Pieters Bart
Van de Walle Rik
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

The deblocking filter in the MPEG-4 AVC/H.264 standard is computationally complex because of its high content adaptivity, resulting in a significant number of data dependencies. These data dependencies interfere with parallel filtering of multiple macroblocks (MBs) on massively parallel architectures. In this letter, we introduce a novel MB partitioning scheme for concurrent deblocking in the MPEG-4 AVC/H. 264 standard, based on our idea of deblocking filter independency, a corrected version of the limited error propagation effect proposed in the letter. Our proposed scheme enables concurrent MB deblocking of luma samples with limited synchronization effort, independently of slice configuration, and is compliant with the MPEG-4 H.264/AVC standard. We implemented the method on the massively parallel architecture of the graphics processing unit (GPU). Experimental results show that our GPU implementation achieves faster-than real-time deblocking at 1309 frames per second for 1080p video pictures. Both software-based deblocking filters and state-of-the-art GPU-enabled algorithms are outperformed in terms of speed by factors up to 10.2 and 19.5, respectively, for 1080p video pictures

Crossref

Ghent University Academic Bibliography

Low Power Architectures for MPEG-4 AVC/H.264 Video Compression

Author: Bahari Asral
Publication venue: The University of Edinburgh
Publication date: 01/01/2008
Field of study

Edinburgh Research Archive

Dynamically Reconfigurable Architectures and Systems for Time-varying Image Constraints (DRASTIC) for Image and Video Compression

Author: jiang yuebing
Publication venue: UNM Digital Repository
Publication date: 12/07/2014
Field of study

In the current information booming era, image and video consumption is ubiquitous. The associated image and video coding operations require significant computing resources for both small-scale computing systems as well as over larger network systems. For different scenarios, power, bitrate and image quality can impose significant time-varying constraints. For example, mobile devices (e.g., phones, tablets, laptops, UAVs) come with significant constraints on energy and power. Similarly, computer networks provide time-varying bandwidth that can depend on signal strength (e.g., wireless networks) or network traffic conditions. Alternatively, the users can impose different constraints on image quality based on their interests. Traditional image and video coding systems have focused on rate-distortion optimization. More recently, distortion measures (e.g., PSNR) are being replaced by more sophisticated image quality metrics. However, these systems are based on fixed hardware configurations that provide limited options over power consumption. The use of dynamic partial reconfiguration with Field Programmable Gate Arrays (FPGAs) provides an opportunity to effectively control dynamic power consumption by jointly considering software-hardware configurations. This dissertation extends traditional rate-distortion optimization to rate-quality-power/energy optimization and demonstrates a wide variety of applications in both image and video compression. In each application, a family of Pareto-optimal configurations are developed that allow fine control in the rate-quality-power/energy optimization space. The term Dynamically Reconfiguration Architecture Systems for Time-varying Image Constraints (DRASTIC) is used to describe the derived systems. DRASTIC covers both software-only as well as software-hardware configurations to achieve fine optimization over a set of general modes that include: (i) maximum image quality, (ii) minimum dynamic power/energy, (iii) minimum bitrate, and (iv) typical mode over a set of opposing constraints to guarantee satisfactory performance. In joint software-hardware configurations, DRASTIC provides an effective approach for dynamic power optimization. For software configurations, DRASTIC provides an effective method for energy consumption optimization by controlling processing times. The dissertation provides several applications. First, stochastic methods are given for computing quantization tables that are optimal in the rate-quality space and demonstrated on standard JPEG compression. Second, a DRASTIC implementation of the DCT is used to demonstrate the effectiveness of the approach on motion JPEG. Third, a reconfigurable deblocking filter system is investigated for use in the current H.264/AVC systems. Fourth, the dissertation develops DRASTIC for all 35 intra-prediction modes as well as intra-encoding for the emerging High Efficiency Video Coding standard (HEVC)

No-reference analysis of decoded MPEG images for PSNR estimation and post-processing

Author: Andersen Jakob Dahl
Forchhammer Søren
Li Huiying
Publication venue
Publication date: 01/01/2011
Field of study

We propose no-reference analysis and processing of DCT (Discrete Cosine Transform) coded images based on estimation of selected MPEG parameters from the decoded video. The goal is to assess MPEG video quality and perform post-processing without access to neither the original stream nor the code stream. Solutions are presented for MPEG-2 video. A method to estimate the quantization parameters of DCT coded images and MPEG I-frames at the macro-block level is presented. The results of this analysis is used for deblocking and deringing artifact reduction and no-reference PSNR estimation without code stream access. An adaptive deringing method using texture classification is presented. On the test set, the quantization parameters in MPEG-2 I-frames are estimated with an overall accuracy of 99.9% and the PSNR is estimated with an overall average error of 0.3dB. The deringing and deblocking algorithms yield improvements of 0.3dB on the MPEG-2 decoded test sequences

Crossref

Online Research Database In Technology

Processing Decoded Video for LCD-LED Backlight Display:Post processing of decoded video and local backlight dimming for LCD technology with LED-based backlight

Author: Nadernejad Ehsan
Publication venue: Technical University of Denmark
Publication date: 01/01/2013
Field of study

Online Research Database In Technology

GPU Parallelization of HEVC In-Loop Filters

Author: Chi Chi Ching
de Souza Diego F.
Ilic Aleksandar
Juurlink Ben
Roma Nuno
Sousa Leonel
Wang Biao
Álvarez-Mesa Mauricio
Publication venue
Publication date: 01/01/2017
Field of study

In the High Efficiency Video Coding (HEVC) standard, multiple decoding modules have been designed to take advantage of parallel processing. In particular, the HEVC in-loop filters (i.e., the deblocking filter and sample adaptive offset) were conceived to be exploited by parallel architectures. However, the type of the offered parallelism mostly suits the capabilities of multi-core CPUs, thus making a real challenge to efficiently exploit massively parallel architectures such as Graphic Processing Units (GPUs), mainly due to the existing data dependencies between the HEVC decoding procedures. In accordance, this paper presents a novel strategy to increase the amount of parallelism and the resulting performance of the HEVC in-loop filters on GPU devices. For this purpose, the proposed algorithm performs the HEVC filtering at frame-level and employs intrinsic GPU vector instructions. When compared to the state-of-the-art HEVC in-loop filter implementations, the proposed approach also reduces the amount of required memory transfers, thus further boosting the performance. Experimental results show that the proposed GPU in-loop filters deliver a significant improvement in decoding performance. For example, average frame rates of 76 frames per second (FPS) and 125 FPS for Ultra HD 4K are achieved on an embedded NVIDIA GPU for All Intra and Random Access configurations, respectively

DepositOnce

Parallel scalability and efficiency of HEVC parallelization approaches

Author: Chi Chi Ching
Clare Gordon
Henry Félix
Juurlink Ben
Pateux Stéphane
Thomas Schierl
Álvarez-Mesa Mauricio
Publication venue
Publication date: 01/01/2012
Field of study

Unlike H.264/advanced video coding, where parallelism was an afterthought, High Efficiency Video Coding currently contains several proposals aimed at making it more parallel-friendly. A performance comparison of the different proposals, however, has not yet been performed. In this paper, we will fill this gap by presenting efficient implementations of the most promising parallelization proposals, namely tiles and wavefront parallel processing (WPP). In addition, we present a novel approach called overlapped wavefront (OWF), which achieves higher performance and efficiency than tiles and WPP. Experiments conducted on a 12-core system running at 3.33 GHz show that our implementations achieve average speedups, for 4k sequences, of 8.7, 9.3, and 10.7 for WPP, tiles, and OWF, respectively

DepositOnce