46 research outputs found

    Decoder Hardware Architecture for HEVC

    Get PDF
    This chapter provides an overview of the design challenges faced in the implementation of hardware HEVC decoders. These challenges can be attributed to the larger and diverse coding block sizes and transform sizes, the larger interpolation filter for motion compensation, the increased number of steps in intra prediction and the introduction of a new in-loop filter. Several solutions to address these implementation challenges are discussed. As a reference, results for an HEVC decoder test chip are also presented.Texas Instruments Incorporate

    A 249-Mpixel/s HEVC Video-Decoder Chip for 4K Ultra-HD Applications

    Get PDF
    High Efficiency Video Coding, the latest video standard, uses larger and variable-sized coding units and longer interpolation filters than [H.264 over AVC] to better exploit redundancy in video signals. These algorithmic techniques enable a 50% decrease in bitrate at the cost of computational complexity, external memory bandwidth, and, for ASIC implementations, on-chip SRAM of the video codec. This paper describes architectural optimizations for an HEVC video decoder chip. The chip uses a two-stage subpipelining scheme to reduce on-chip SRAM by 56 kbytes-a 32% reduction. A high-throughput read-only cache combined with DRAM-latency-aware memory mapping reduces DRAM bandwidth by 67%. The chip is built for HEVC Working Draft 4 Low Complexity configuration and occupies 1.77 mm[superscript 2] in 40-nm CMOS. It performs 4K Ultra HD 30-fps video decoding at 200 MHz while consuming 1.19 [nJ over pixel] of normalized system power.Texas Instruments Incorporate

    Circuit implementations for high-efficiency video coding tools

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 71-72).High-Efficiency Video Coding (HEVC) is planned to be the successor video standard to the popular Advanced Video Coding (H.264/AVC) with a targeted 2x improvement in compression at the same quality. This improvement comes at the cost of increased complexity through the addition of new coding tools and increased computation in existing tools. The ever-increasing demand for higher resolution video further adds to the computation cost. In this work, digital circuits for two HEVC tools - inverse transform and deblocking filter are implemented to support Quad-Full HD (4K x 2K) video decoding at 30fps. Techniques to reduce power and area cost are investigated and synthesis results in 40nm CMOS technology and Virtex-6 FPGA platform are presented.by Mehul Tikekar.S.M

    Dynamically Reconfigurable Architectures and Systems for Time-varying Image Constraints (DRASTIC) for Image and Video Compression

    Get PDF
    In the current information booming era, image and video consumption is ubiquitous. The associated image and video coding operations require significant computing resources for both small-scale computing systems as well as over larger network systems. For different scenarios, power, bitrate and image quality can impose significant time-varying constraints. For example, mobile devices (e.g., phones, tablets, laptops, UAVs) come with significant constraints on energy and power. Similarly, computer networks provide time-varying bandwidth that can depend on signal strength (e.g., wireless networks) or network traffic conditions. Alternatively, the users can impose different constraints on image quality based on their interests. Traditional image and video coding systems have focused on rate-distortion optimization. More recently, distortion measures (e.g., PSNR) are being replaced by more sophisticated image quality metrics. However, these systems are based on fixed hardware configurations that provide limited options over power consumption. The use of dynamic partial reconfiguration with Field Programmable Gate Arrays (FPGAs) provides an opportunity to effectively control dynamic power consumption by jointly considering software-hardware configurations. This dissertation extends traditional rate-distortion optimization to rate-quality-power/energy optimization and demonstrates a wide variety of applications in both image and video compression. In each application, a family of Pareto-optimal configurations are developed that allow fine control in the rate-quality-power/energy optimization space. The term Dynamically Reconfiguration Architecture Systems for Time-varying Image Constraints (DRASTIC) is used to describe the derived systems. DRASTIC covers both software-only as well as software-hardware configurations to achieve fine optimization over a set of general modes that include: (i) maximum image quality, (ii) minimum dynamic power/energy, (iii) minimum bitrate, and (iv) typical mode over a set of opposing constraints to guarantee satisfactory performance. In joint software-hardware configurations, DRASTIC provides an effective approach for dynamic power optimization. For software configurations, DRASTIC provides an effective method for energy consumption optimization by controlling processing times. The dissertation provides several applications. First, stochastic methods are given for computing quantization tables that are optimal in the rate-quality space and demonstrated on standard JPEG compression. Second, a DRASTIC implementation of the DCT is used to demonstrate the effectiveness of the approach on motion JPEG. Third, a reconfigurable deblocking filter system is investigated for use in the current H.264/AVC systems. Fourth, the dissertation develops DRASTIC for all 35 intra-prediction modes as well as intra-encoding for the emerging High Efficiency Video Coding standard (HEVC)

    HEVC video compression hardware designs

    Get PDF
    High Efficiency Video Coding (HEVC), a recently developed international standard for video compression, offers significantly better video compression efficiency than previous international standards. However, this coding gain comes with an increase in computational complexity. Therefore, in this thesis, we first designed a high performance hardware architecture for deblocking filter algorithm used in HEVC standard. Two parallel datapaths are used in the hardware to increase its performance. The proposed hardware is implemented in Verilog HDL. The Verilog RTL code is mapped to a Xilinx XC6VLX240T FPGA, and it is verified to work correctly on a Xilinx ML605 FPGA board which includes a Xilinx XC6VLX240T FPGA. The FPGA implementation can work at 108 MHz, and it can code 30 full HD (1920x1080) video frames per second. We then proposed an energy reduction technique for Sum of Absolute Transformed Difference (SATD) based HEVC intra mode decision algorithm. We designed an efficient hardware architecture for SATD based HEVC intra mode decision algorithm including the proposed technique. The proposed hardware is implemented in Verilog HDL. The Verilog RTL code is mapped to a Xilinx XC6VLX365T FPGA, and it is verified with post place & route simulations. The FPGA implementation can work at 116 MHz, and it can code 21 HD (1280x720) video frames per second. The proposed technique reduced its energy consumption up to 64.6% on this FPGA without any PSNR loss

    Parallel video decoding in the emerging HEVC standard

    Get PDF
    In this paper we propose and evaluate a parallelization strategy for the emerging HEVC video coding standard. The proposed strategy is based on entropy slices which allows exploiting parallelism in the entropy decoding stage while maintaining high coding efficiency. Our approach requires to encode videos with one entropy slice per LCU row in order to decode multiple LCU rows in a wavefront parallel manner. Evaluations performed on a PC with 12 Intel Xeon cores running at 3.3 GHz show that it is possible to achieve real-time performance for 1920×1080p50 (53.1 fps) and 2560×1600 (29.5fps) video resolutions with speedups of 5.2× and 6.3× compared to sequential execution, respectively

    SIMD acceleration for HEVC decoding

    Get PDF
    Single instruction multiple data (SIMD) instructions have been commonly used to accelerate video codecs. The recently introduced High Efficiency Video Coding (HEVC) codec like its predecessors is based on the hybrid video codec principle and, therefore, is also well suited to be accelerated with SIMD. In this paper we present the SIMD optimization for the entire HEVC decoder for all major SIMD instruction set architectures. Evaluation has been performed on 14 mobile and PC platforms covering most major architectures released in recent years. With SIMD, up to 5× speedup can be achieved over the entire HEVC decoder, resulting in up to 133 and 37.8 frames/s on average on a single core for Main profile 1080p and Main10 profile 2160p sequences, respectively.EC/FP7/288653/EU/Low-Power Parallel Computing on GPUs/LPGP

    An energy-aware system-on-chip architecture for intra prediction in HEVC standard

    Get PDF
    High resolution 4K and 8K are becoming the more used in video applications. Those resolutions are well supported in the new HEVC standard. Thus, embedded solutions such as development of dedicated ystems-On-Chips (SOC) to accelerate video processing on one chip instead of only software solutions are commendable. This paper proposes a novel parallel and high efficient hardware accelerator for the intra prediction block. This accelerator achieves a high-speed treatment due to pipelined processing units and parallel shaped architecture. The complexity of memory access is also reduced thanks to the proposed design with less increased power consumption. The implementation was performed on the 7 Series FPGA 28 nm technology resources on Zynq-7000 and results show, that the proposed architecture takes 16520 LUTs and can reach 143.65 MHz as a maximum frequency and it is able to support the throughput of 3840×2160 sequence at 90 frames per second

    Parallel algorithms and architectures for low power video decoding

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 197-204).Parallelism coupled with voltage scaling is an effective approach to achieve high processing performance with low power consumption. This thesis presents parallel architectures and algorithms designed to deliver the power and performance required for current and next generation video coding. Coding efficiency, area cost and scalability are also addressed. First, a low power video decoder is presented for the current state-of-the-art video coding standard H.264/AVC. Parallel architectures are used along with voltage scaling to deliver high definition (HD) decoding at low power levels. Additional architectural optimizations such as reducing memory accesses and multiple frequency/voltage domains are also described. An H.264/AVC Baseline decoder test chip was fabricated in 65-nm CMOS. It can operate at 0.7 V for HD (720p, 30 fps) video decoding and with a measured power of 1.8 mW. The highly scalable decoder can tradeoff power and performance across >100x range. Second, this thesis demonstrates how serial algorithms, such as Context-based Adaptive Binary Arithmetic Coding (CABAC), can be redesigned for parallel architectures to enable high throughput with low coding efficiency cost. A parallel algorithm called the Massively Parallel CABAC (MP-CABAC) is presented that uses syntax element partitions and interleaved entropy slices to achieve better throughput-coding efficiency and throughput-area tradeoffs than H.264/AVC. The parallel algorithm also improves scalability by providing a third dimension to tradeoff coding efficiency for power and performance. Finally, joint algorithm-architecture optimizations are used to increase performance and reduce area with almost no coding penalty. The MP-CABAC is mapped to a highly parallel architecture with 80 parallel engines, which together delivers >10x higher throughput than existing H.264/AVC CABAC implementations. A MP-CABAC test chip was fabricated in 65-nm CMOS to demonstrate the power-performance-coding efficiency tradeoff.by Vivienne. Sze.Ph.D
    corecore