261 research outputs found

    Performance evaluation of H.264/AVC decoding and visualization using the GPU

    Get PDF
    The coding efficiency of the H.264/AVC standard makes the decoding process computationally demanding. This has limited the availability of cost-effective, high-performance solutions. Modern computers are typically equipped with powerful yet cost-effective Graphics Processing Units (GPUs) to accelerate graphics operations. These GPUs can be addressed by means of a 3-D graphics API such as Microsoft Direct3D or OpenGL, using programmable shaders as generic processing units for vector data. The new CUDA (Compute Unified Device Architecture) platform of NVIDIA provides a straightforward way to address the GPU directly, without the need for a 3-D graphics API in the middle. In CUDA, a compiler generates executable code from C code with specific modifiers that determine the execution model. This paper first presents an own-developed H.264/AVC renderer, which is capable of executing motion compensation (MC), reconstruction, and Color Space Conversion (CSC) entirely on the GPU. To steer the GPU, Direct3D combined with programmable pixel and vertex shaders is used. Next, we also present a GPU-enabled decoder utilizing the new CUDA architecture from NVIDIA. This decoder performs MC, reconstruction, and CSC on the GPU as well. Our results compare both GPU-enabled decoders, as well as a CPU-only decoder in terms of speed, complexity, and CPU requirements. Our measurements show that a significant speedup is possible, relative to a CPU-only solution. As an example, real-time playback of high-definition video (1080p) was achieved with our Direct3D and CUDA-based H.264/AVC renderers

    An efficient hardware architecture for H.264 adaptive deblocking filter algorithm

    Get PDF
    This paper presents an efficient hardware architecture for real-time implementation of adaptive deblocking filter algorithm used in H.264 video coding standard. This hardware is designed to be used as part of a complete H.264 video coding system for portable applications. We use a novel edge filter ordering in a Macroblock to prevent the deblocking filter hardware from unnecessarily waiting for the pixels that will be filtered become available. The proposed architecture is implemented in Verilog HDL. The Verilog RTL code is verified to work at 72 MHz in a Xilinx Virtex II FPGA. The FPGA implementation can code 30 CIF frames (352x288) per second

    A Parallel implementation of an mpeg-2 encoder using message-passing

    Get PDF
    The days of film are waning as digital cameras and digital video cameras are becoming commonplace. Uncompressed digital video can consume large amounts of space, making it cumbersome to store efficiently. A method of video compression was developed by the Motion Pictures Expert Group (MPEG), and is now an international standard with the International Organization for Standardization (ISO). This thesis deals with the MPEG-2 Video standard, ISO/IEC 13818-2 [2]. The goal of this thesis is to explore the applications of MPEG-2 encoding in a parallel processing paradigm. To achieve this, a sequential MPEG-2 software encoder was obtained from the MPEG Software Simulation Group (MSSG) [18] and modified to be run, in parallel, on a cluster of single-processor Linux workstations using the Message Passing Interface (MPI) [11, 10, 3]. A multi-threaded pipeline of the encoding process was created using Pthreads [6]. The resulting pipelined parallel encoder has been shown to produce compliant elementary MPEG-2 bitstreams for progressive video sequences. Results of simulation showed that the parallel encoder always performed better than the sequential version as the number of processors scaled. However, it did not exhibit the ideal linear speedup that all parallel programs aim to achieve. This is due to the program executing on a set of resources not ideal for the multi-threaded pipeline. The ensuing chapters will provide the motivation for this work, and an overview of MPEG in addition to parallel processing and programming. Also forthcoming will be how it was achieved and the results produced. Supplementary applications of this work will also be discussed

    Energy-efficient acceleration of MPEG-4 compression tools

    Get PDF
    We propose novel hardware accelerator architectures for the most computationally demanding algorithms of the MPEG-4 video compression standard-motion estimation, binary motion estimation (for shape coding), and the forward/inverse discrete cosine transforms (incorporating shape adaptive modes). These accelerators have been designed using general low-energy design philosophies at the algorithmic/architectural abstraction levels. The themes of these philosophies are avoiding waste and trading area/performance for power and energy gains. Each core has been synthesised targeting TSMC 0.09 Îźm TCBN90LP technology, and the experimental results presented in this paper show that the proposed cores improve upon the prior art

    MPEG-4 Software Video Encoding

    Get PDF
    A Thesis submitted in fulfillment of the requirements of the degree of doctor of Philosophy in the University of LondonThis thesis presents a software model that allows a parallel decomposition of the MPEG-4 video encoder onto shared memory architectures, in order to reduce its total video encoding time. Since a video sequence consists of video objects each of which is likely to have different encoding requirements, the model incorporates a scheduler which (a) always selects the most appropriate video object for encoding and, (b) employs a mechanism for dynamically allocating video objects allocation onto the system processors, based on video object size information. Further spatial video object parallelism is exploited by applying the single program multiple data (SPMD) paradigm within the different modules of the MPEG-4 video encoder. Due to the fact that not all macroblocks have the same processing requirements, the model also introduces a data partition scheme that generates tiles with identical processing requirements. Since, macroblock data dependencies preclude data parallelism at the shape encoder the model also introduces a new mechanism that allows parallelism using a circular pipeline macroblock technique The encoding time depends partly on an encoder’s computational complexity. This thesis also addresses the problem of the motion estimation, as its complexity has a significant impact on the encoder’s complexity. In particular, two fast motion estimation algorithms have been developed for the model which reduce the computational complexity significantly. The thesis includes experimental results on a four processor shared memory platform, Origin200

    VHDL Modeling of an H.264/AVC Video Decoder

    Get PDF
    Transmission and storage of video data has necessitated the development of video com pression techniques. One of today\u27s most widely used video compression techniques is the MPEG-2 standard, which is over ten years old. A task force sponsored by the same groups that developed MPEG-2 has recently finished defining a new standard that is meant to replace MPEG-2 for future video compression applications. This standard, H.264/AVC, uses significantly improved compression techniques. It is capable of providing similar pic ture quality at bit rates of 30-70% less than MPEG-2, depending on the particular video sequence and application [20]. This thesis developed a complete VHDL behavioral model of a video decoder imple menting the Baseline Profile of the H.264/AVC standard. The decoder was verified using a testing environment for comparison with reference software results. Development of a synthesizable hardware description was also shown for two components of the video de coder: the transform unit and the deblocking filter. This demonstrated how a complete video decoder could be developed one module at a time with individual module verifica tion. Analysis was also done to estimate the performance and hardware requirements for a complete implementation on an FPGA device

    Low Power Architectures for MPEG-4 AVC/H.264 Video Compression

    Get PDF

    Architecture design of a scalable adaptive deblocking filter for H.264/AVC

    Get PDF
    Due to significant bit-rate savings and improved perceptual quality, H.264/AVC, the latest video compression standard from the Joint Video Team, is receiving widespread adoption. Greater coding efficiency relative to previous standards is a result of additional techniques and features. One important change is the inclusion of an in-loop deblocking filter for removal of blocking artifacts. Since the filter can easily account for one-third of the computational complexity of a decoder, its addition was a source of debate during the development of the H.264/AVC standard. Ample research on architecture design of the deblocking filter has been carried out, generally targeted toward high performance profiles. To the best of our knowledge no other research investigated designs that can be scaled from low-power extended profiles up to high performance profiles. This work investigated the design of a scalable architecture for the deblocking filter. Four different designs were implemented. The relative performance of the designs were then compared against each other and existing research through simulation. All designs were targeted towards a Xilinx Virtex 5 field programmable gate array (FPGA)

    A toolset for the analysis and optimization of motion estimation algorithms and processors

    Get PDF

    Motion estimation and CABAC VLSI co-processors for real-time high-quality H.264/AVC video coding

    Get PDF
    Real-time and high-quality video coding is gaining a wide interest in the research and industrial community for different applications. H.264/AVC, a recent standard for high performance video coding, can be successfully exploited in several scenarios including digital video broadcasting, high-definition TV and DVD-based systems, which require to sustain up to tens of Mbits/s. To that purpose this paper proposes optimized architectures for H.264/AVC most critical tasks, Motion estimation and context adaptive binary arithmetic coding. Post synthesis results on sub-micron CMOS standard-cells technologies show that the proposed architectures can actually process in real-time 720 × 480 video sequences at 30 frames/s and grant more than 50 Mbits/s. The achieved circuit complexity and power consumption budgets are suitable for their integration in complex VLSI multimedia systems based either on AHB bus centric on-chip communication system or on novel Network-on-Chip (NoC) infrastructures for MPSoC (Multi-Processor System on Chip
    • …
    corecore