539 research outputs found

    An efficient FPGA implementation of versatile video coding intra prediction

    Get PDF
    Versatile Video Coding (VVC) is a new international video compression standard offering much better compression efficiency than previous video compression standards at the expense of much higher computational complexity. In this paper, an efficient FPGA implementation of VVC intra prediction for angular prediction modes of 4x4, 8x8, 16x16 and 32x32 prediction unit sizes is proposed. In the proposed FPGA implementation, four constant multiplications used in one intra angular prediction equation are implemented using two DSP blocks and two adders in FPGA. The proposed FPGA implementation of VVC intra prediction, in the worst case, can process 34 full HD (1920x1080) frames per second

    An efficient hardware architecture for H.264 intra prediction algorithm

    Get PDF
    In this paper, we present an efficient hardware architecture for real-time implementation of intra prediction algorithm used in H.264 / MPEG4 Part 10 video coding standard. The hardware design is based on a novel organization of the intra prediction equations. This hardware is designed to be used as part of a complete H.264 video coding system for portable applications. The proposed architecture is implemented in Verilog HDL. The Verilog RTL code is verified to work at 90 MHz in a Xilinx Virtex II FPGA. The FPGA implementation can process 27 VGA frames (640x480) per second

    An efficient H.264 intra frame coder hardware design

    Get PDF
    H.264 / MPEG-4 Part 10, a recently developed international standard for video compression, offers significantly better video compression efficiency than previous international standards. Since it is impossible to implement a real-time H.264 video coder using a state-of-the-art embedded processor alone, in this thesis, we developed an efficient FPGA-based H.264 intra frame coder hardware for real-time portable applications targeting level 2.0 of baseline profile. We first designed a high performance and low cost hardware architecture for realtime implementation of entropy coding algorithms, context adaptive variable length coding and exp-golomb coding, used in H.264 video coding standard. The hardware is implemented in Verilog HDL and verified with RTL simulations using Mentor Graphics Modelsim. We then designed a high performance and low cost hardware architecture for real-time implementation of intra prediction algorithm used in H.264 video coding standard. This hardware is also implemented in Verilog HDL and verified with RTL simulations using Mentor Graphics Modelsim. We then designed and implemented the top-level H.264 intra frame coder hardware. The hardware is implemented by integrating intra prediction, mode decision, transform-quant and entropy coding modules. The H.264 intra frame coder hardware is verified to be compliant with H.264 standard and it can code 35 CIF (352x288) frames per second. The hardware is first verified with RTL simulations using Mentor Graphics Modelsim. It is then verified to work at 71 MHz on a Xilinx Virtex II FPGA on an ARM Versatile Platform development board. The bitstream generated by the H.264 intra frame coder hardware for an input frame is successfully decoded by H.264 Joint Model (JM) reference software decoder and the decoded frame is displayed using a YUV Player tool for visual verification

    An efficient hardware architecture for H.264 adaptive deblocking filter algorithm

    Get PDF
    This paper presents an efficient hardware architecture for real-time implementation of adaptive deblocking filter algorithm used in H.264 video coding standard. This hardware is designed to be used as part of a complete H.264 video coding system for portable applications. We use a novel edge filter ordering in a Macroblock to prevent the deblocking filter hardware from unnecessarily waiting for the pixels that will be filtered become available. The proposed architecture is implemented in Verilog HDL. The Verilog RTL code is verified to work at 72 MHz in a Xilinx Virtex II FPGA. The FPGA implementation can code 30 CIF frames (352x288) per second

    Low energy HEVC and VVC video compression hardware

    Get PDF
    Video compression standards compress a digital video by reducing and removing redundancy in the digital video using computationally complex algorithms. As spatial and temporal resolutions of videos increase, compression efficiencies of video compression algorithms are also increasing. However, increased compression efficiency comes with increased computational complexity. Therefore, it is necessary to reduce computational complexities of video compression algorithms without reducing their visual quality in order to reduce area and energy consumption of their hardware implementations. In this thesis, we propose a novel technique for reducing amount of computations performed by HEVC intra prediction algorithm. We designed low energy, reconfigurable HEVC intra prediction hardware using the proposed technique. We also designed a low energy FPGA implementation of HEVC intra prediction algorithm using the proposed technique and DSP blocks. We propose a reconfigurable VVC intra prediction hardware architecture. We also propose an efficient VVC intra prediction hardware architecture using DSP blocks. We designed low energy VVC fractional interpolation hardware. We propose a novel approximate absolute difference technique. We designed low energy approximate absolute difference hardware using the proposed technique. We propose a novel approximate constant multiplication technique. We designed approximate constant multiplication hardware using the proposed technique. We quantified computation reductions achieved by the proposed techniques and video quality loss caused by the proposed approximation techniques. The proposed approximate absolute difference technique and approximate constant multiplication technique cause very small PSNR loss. The other proposed techniques cause no PSNR loss. We implemented the proposed hardware architectures in Verilog HDL. We mapped the Verilog RTL codes to Xilinx Virtex 6 or Xilinx Virtex 7 FPGAs and estimated their power consumptions using Xilinx XPower Analyzer tool. The proposed techniques significantly reduced power and energy consumptions of these FPGA implementation

    Image and Video Coding Techniques for Ultra-low Latency

    Get PDF
    The next generation of wireless networks fosters the adoption of latency-critical applications such as XR, connected industry, or autonomous driving. This survey gathers implementation aspects of different image and video coding schemes and discusses their tradeoffs. Standardized video coding technologies such as HEVC or VVC provide a high compression ratio, but their enormous complexity sets the scene for alternative approaches like still image, mezzanine, or texture compression in scenarios with tight resource or latency constraints. Regardless of the coding scheme, we found inter-device memory transfers and the lack of sub-frame coding as limitations of current full-system and software-programmable implementations.publishedVersionPeer reviewe

    High-Level Synthesis Based VLSI Architectures for Video Coding

    Get PDF
    High Efficiency Video Coding (HEVC) is state-of-the-art video coding standard. Emerging applications like free-viewpoint video, 360degree video, augmented reality, 3D movies etc. require standardized extensions of HEVC. The standardized extensions of HEVC include HEVC Scalable Video Coding (SHVC), HEVC Multiview Video Coding (MV-HEVC), MV-HEVC+ Depth (3D-HEVC) and HEVC Screen Content Coding. 3D-HEVC is used for applications like view synthesis generation, free-viewpoint video. Coding and transmission of depth maps in 3D-HEVC is used for the virtual view synthesis by the algorithms like Depth Image Based Rendering (DIBR). As first step, we performed the profiling of the 3D-HEVC standard. Computational intensive parts of the standard are identified for the efficient hardware implementation. One of the computational intensive part of the 3D-HEVC, HEVC and H.264/AVC is the Interpolation Filtering used for Fractional Motion Estimation (FME). The hardware implementation of the interpolation filtering is carried out using High-Level Synthesis (HLS) tools. Xilinx Vivado Design Suite is used for the HLS implementation of the interpolation filters of HEVC and H.264/AVC. The complexity of the digital systems is greatly increased. High-Level Synthesis is the methodology which offers great benefits such as late architectural or functional changes without time consuming in rewriting of RTL-code, algorithms can be tested and evaluated early in the design cycle and development of accurate models against which the final hardware can be verified

    Low power H.264 video compression hardware designs

    Get PDF
    Video compression systems are used in many commercial products such as digital camcorders, cellular phones and video teleconferencing systems. H.264 / MPEG4 Part 10, the recently developed international standard for video compression, offers significantly better video compression efficiency than previous international standards. However, this coding gain comes with an increase in encoding complexity and therefore in power consumption. Since portable devices operate with battery, it is important to reduce power consumption so that the battery life can be increased. In addition, consuming excessive power degrades the performance of integrated circuits, increases packaging and cooling costs, reduces the reliability and may cause device failures. Therefore, power consumption is an important design metric for video compression hardware. In this thesis, we propose low power hardware designs for Deblocking Filter (DBF), intra prediction and intra mode decision parts of an H.264 video encoder. The proposed hardware architectures are implemented in Verilog HDL and mapped to Xilinx Virtex II FPGA. We performed detailed power consumption analysis of FPGA implementations of these hardware designs using Xilinx XPower tool. We also measured the power consumptions of DBF hardware implementations on a Xilinx Virtex II FPGA and there is a good match between estimated and measured power consumption results. We then worked on decreasing the power consumption of FPGA implementations of these H.264 video compression hardware designs by reducing switching activity using Register Transfer Level (RTL) low power techniques. We applied several RTL low power techniques such as clock gating and glitch reduction to these designs and quantified their impact on the power consumption of the FPGA implementations of these designs. We proposed novel computational complexity and power reduction techniques which avoid unnecessary calculations in DBF, intra prediction and intra mode decision parts of an H.264 video encoder. We quantified the computation reductions achieved by the proposed techniques using H.264 Joint Model software encoder. We applied these techniques to proposed hardware designs and quantified their impact on the power consumption of the FPGA implementations of these designs

    Methodology and optimizing of multiple frame format buffering within FPGA H.264/AVC decoder with FRExt.

    Get PDF
    Digital representation of video data is an inherently resource demanding problem that continues to necessitate the development and refinement of coding methods. The H.264/AVC standard, along with its recent Fidelity Range Extensions amendment (FRExt), is quickly being adopted as the standard codec for broadcast and distribution of high definition video. The FRExt amendment, while not necessarily affecting the overall decoder architecture, presents an added complexity of providing efficient memory management for buffering intermediate frames of various pixel color samplings and depths. This thesis evaluated the role of designing the frame buffer of a hardware video decoder, with integrated support for the H.264/AVC codec plus FRExt. With focus on organizing external memory data access, the frame buffer was designed to provide intermediate data storage for the decoder, while using an efficient store and load scheme that takes into consideration each frame pixel format of the video data. VHDL was used to model the frame buffer. Exploitation of reconfigurability and post-synthesis FPGA simulations were used to evaluate behavior, scalability and power consumption, while providing an analysis of approaches to adding FRExt to the memory management. Real-time buffer performance was achieved for two common frame formats at 1080 HD resolution; and an innovative pipeline design provides dynamic switching of formats between video sequences. As an additional consequence of verifying the model, a preexisting Baseline H.264/AVC decoder testbench was augmented to support testing of multiple frame formats
    corecore