20 research outputs found

    A Cost Shared Quantization Algorithm and its Implementation for Multi-Standard Video CODECS

    Get PDF
    The current trend of digital convergence creates the need for the video encoder and decoder system, known as codec in short, that should support multiple video standards on a single platform. In a modern video codec, quantization is a key unit used for video compression. In this thesis, a generalized quantization algorithm and hardware implementation is presented to compute quantized coefficient for six different video codecs including the new developing codec High Efficiency Video Coding (HEVC). HEVC, successor to H.264/MPEG-4 AVC, aims to substantially improve coding efficiency compared to AVC High Profile. The thesis presents a high performance circuit shared architecture that can perform the quantization operation for HEVC, H.264/AVC, AVS, VC-1, MPEG- 2/4 and Motion JPEG (MJPEG). Since HEVC is still in drafting stage, the architecture was designed in such a way that any final changes can be accommodated into the design. The proposed quantizer architecture is completely division free as the division operation is replaced by multiplication, shift and addition operations. The design was implemented on FPGA and later synthesized in CMOS 0.18 ÎŒm technology. The results show that the proposed design satisfies the requirement of all codecs with a maximum decoding capability of 60 fps at 187.3 MHz for Xilinx Virtex4 LX60 FPGA of a 1080p HD video. The scheme is also suitable for low-cost implementation in modern multi-codec systems

    Design and Implementation of IDCT/IDST-Specific Accelerators for HEVC Standard on Heterogeneous Accelerator-Rich Platform

    Get PDF
    Having High Efficiency Video Coding (HEVC) is important for image processing, reducing bandwidth, and increasing video quality. There are different methods that can be used to implement HEVC. This thesis focuses on design and implementation of application-specific accelerators for IDCT/IDST algorithms dedicated for HEVC standard. Those algorithms are parallel-in-nature tasks which makes them suitable to be executed by heterogeneous multicore platforms. This is done using accelerators which are required for power efficient processing. In this study, Coarse-Grained Reconfigurable Arrays (CGRAs) are used for making a template for an accelerator. CGRA has one of the major roles in a Heterogeneous Accelerator-Rich Platforms (HARP) as it is capable of accelerating non-parallel loops with lower loop counts. This thesis includes various algorithms for the use of IDCT and IDST with different designs and templates, reaching a unique final architecture. The final output intended is to reach 4 points IDST together with a 4/8 points IDCT. Another feature added to the hypothesis is the use of different dimensions for the CGRA template in order to have a different type of accelerator. The many CGRAs are combined together in successive arrangement with Reduced Instructions Set Computers (RISC) over the Network-on-Chip (NoC). The aim is to study the performance of the accelerator used for the IDCT and the IDST. This can be evaluated as the data movement through NoC network along with comparison of performance of accelerator with clock cycles in order to calculate the efficiency of the system. The results show that a four point IDST and IDCT can be computed in 56 clock cycles. In addition, the 8 point IDCT can be implemented in 64 cycles. One important factor to consider during the study is the power and energy consumption which is important in this century. The dynamic power dissipation usage for the routing of data has reached a value of 4.03 mW. Whereas, the energy consumption was 1.76 Ό\muJ for the 4 points system (IDCT and IDST) and 3.06 Ό\muJ for the 8 points (IDCT). Processing Elements (PEs) are used for implementing the transform algorithm and units were operated at 200 MHz. Finally, these results show that 1080P image at 30 frames per second can be attained by using FPGA

    Algoritmo de estimação de movimento e sua arquitetura de hardware para HEVC

    Get PDF
    Doutoramento em Engenharia EletrotĂ©cnicaVideo coding has been used in applications like video surveillance, video conferencing, video streaming, video broadcasting and video storage. In a typical video coding standard, many algorithms are combined to compress a video. However, one of those algorithms, the motion estimation is the most complex task. Hence, it is necessary to implement this task in real time by using appropriate VLSI architectures. This thesis proposes a new fast motion estimation algorithm and its implementation in real time. The results show that the proposed algorithm and its motion estimation hardware architecture out performs the state of the art. The proposed architecture operates at a maximum operating frequency of 241.6 MHz and is able to process 1080p@60Hz with all possible variables block sizes specified in HEVC standard as well as with motion vector search range of up to ±64 pixels.A codificação de vĂ­deo tem sido usada em aplicaçÔes tais como, vĂ­deovigilĂąncia, vĂ­deo-conferĂȘncia, video streaming e armazenamento de vĂ­deo. Numa norma de codificação de vĂ­deo, diversos algoritmos sĂŁo combinados para comprimir o vĂ­deo. Contudo, um desses algoritmos, a estimação de movimento Ă© a tarefa mais complexa. Por isso, Ă© necessĂĄrio implementar esta tarefa em tempo real usando arquiteturas de hardware apropriadas. Esta tese propĂ”e um algoritmo de estimação de movimento rĂĄpido bem como a sua implementação em tempo real. Os resultados mostram que o algoritmo e a arquitetura de hardware propostos tĂȘm melhor desempenho que os existentes. A arquitetura proposta opera a uma frequĂȘncia mĂĄxima de 241.6 MHz e Ă© capaz de processar imagens de resolução 1080p@60Hz, com todos os tamanhos de blocos especificados na norma HEVC, bem como um domĂ­nio de pesquisa de vetores de movimento atĂ© ±64 pixels

    Low energy video processing and compression hardware designs

    Get PDF
    Digital video processing and compression algorithms are used in many commercial products such as mobile devices, unmanned aerial vehicles, and autonomous cars. Increasing resolution of videos used in these commercial products increased computational complexities of digital video processing and compression algorithms. Therefore, it is necessary to reduce computational complexities of digital video processing and compression algorithms, and energy consumptions of digital video processing and compression hardware without reducing visual quality. In this thesis, we propose a novel adaptive 2D digital image processing algorithm for 2D median filter, Gaussian blur and image sharpening. We designed low energy 2D median filter, Gaussian blur and image sharpening hardware using the proposed algorithm. We propose approximate HEVC intra prediction and HEVC fractional interpolation algorithms. We designed low energy approximate HEVC intra prediction and HEVC fractional interpolation hardware. We also propose several HEVC fractional interpolation hardware architectures. We propose novel computational complexity and energy reduction techniques for HEVC DCT and inverse DCT/DST. We designed high performance and low energy hardware for HEVC DCT and inverse DCT/DST including the proposed techniques. VII We quantified computation reductions achieved and video quality loss caused by the proposed algorithms and techniques. We implemented the proposed hardware architectures in Verilog HDL. We mapped the Verilog RTL codes to Xilinx Virtex 6 and Xilinx ZYNQ FPGAs, and estimated their power consumptions using Xilinx XPower Analyzer tool. The proposed algorithms and techniques significantly reduced the power and energy consumptions of these FPGA implementations in some cases with no PSNR loss and in some cases with very small PSNR loss

    Video Stream Adaptation In Computer Vision Systems

    Get PDF
    Computer Vision (CV) has been deployed recently in a wide range of applications, including surveillance and automotive industries. According to a recent report, the market for CV technologies will grow to $33.3 billion by 2019. Surveillance and automotive industries share over 20% of this market. This dissertation considers the design of real-time CV systems with live video streaming, especially those over wireless and mobile networks. Such systems include video cameras/sensors and monitoring stations. The cameras should adapt their captured videos based on the events and/or available resources and time requirement. The monitoring station receives video streams from all cameras and run CV algorithms for decisions, warnings, control, and/or other actions. Real-time CV systems have constraints in power, computational, and communicational resources. Most video adaptation techniques considered the video distortion as the primary metric. In CV systems, however, the main objective is enhancing the event/object detection/recognition/tracking accuracy. The accuracy can essentially be thought of as the quality perceived by machines, as opposed to the human perceptual quality. High-Efficiency Video Coding (HEVC) is a recent encoding standard that seeks to address the limited communication bandwidth problem as a result of the popularity of High Definition (HD) videos. Unfortunately, HEVC adopts algorithms that greatly slow down the encoding process, and thus results in complications in real-time systems. This dissertation presents a method for adapting live video streams to limited and varying network bandwidth and energy resources. It analyzes and compares the rate-accuracy and rate-energy characteristics of various video streams adaptation techniques in CV systems. We model the video capturing, encoding, and transmission aspects and then provide an overall model of the power consumed by the video cameras and/or sensors. In addition to modeling the power consumption, we model the achieved bitrate of video encoding. We validate and analyze the power consumption models of each phase as well as the aggregate power consumption model through extensive experiments. The analysis includes examining individual parameters separately and examining the impacts of changing more than one parameter at a time. For HEVC, we develop an algorithm that predicts the size of the block without iterating through the exhaustive Rate Distortion Optimization (RDO) method. We demonstrate the effectiveness of the proposed algorithm in comparison with existing algorithms. The proposed algorithm achieves approximately 5 times the encoding speed of the RDO algorithm and 1.42 times the encoding speed of the fastest analyzed algorithm

    IMPLEMENTASI HEVC CODEC PADA PLATFORM BERBASIS FPGA

    Get PDF
    High Efficiency Video Coding (HEVC) telah di desain sebagai standar baru untuk beberapa aplikasi video dan memiliki peningkatan performa dibanding dengan standar sebelumnya. Meskipun HEVC mencapai efisiensi coding yang tinggi, namun HEVC memiliki kekurangan pada beban pemrosesan tinggi dan loading yang berat ketika melakukan proses encoding video. Untuk meningkatkan performa encoder, kami bertujuan untuk mengimplementasikan HEVC codec pada Zynq 7000 AP SoC. Kami mencoba mengimplementasikan HEVC menggunakan tiga desain sistem. Pertama, HEVC codec di implementasikan pada Zynq PS. Kedua, encoder HEVC di implementasikan dengan hardware/software co-design. Ketiga, mengimplementasikan sebagian dari encoder HEVC pada Zynq PL. Pada implementasi kami menggunakan Xilinx Vivado HLS untuk mengembangkan codec. Hasil menunjukkan bahwa HEVC codec dapat di implementasikan pada Zynq PS. Codec dapat mengurangi ukuran video dibanding ukuran asli video pada format H.264. Kualitas video hampir sama dengan format H.264. Sayangnya, kami tidak dapat menyelesaikan desain dengan hardware/software co-design karena kompleksitas coding untuk validasi kode C pada Vivado HLS. Hasil lain, sebagian dari encoder HEVC dapat di implementasikan pada Zynq PL, yaitu HEVC 2D IDCT. Dari implementasi kami dapat mengoptimalkan fungsi loop pada HEVC 2D dan 1D IDCT menggunakan pipelining. Perbandingan hasil antara pipelining inner-loop dan outer-loop menunjukkan bahwa pipelining di outer-loop dapat meningkatkan performa dilihat dari nilai latency

    A Systematic Hardware Sharing Method for Unified Architecture Design of H.264 Transforms

    Get PDF
    Multitransform techniques have been widely used in modern video coding and have better compression efficiency than the single transform technique that is used conventionally. However, every transform needs a corresponding hardware implementation, which results in a high hardware cost for multiple transforms. A novel method that includes a five-step operation sharing synthesis and architecture-unification techniques is proposed to systematically share the hardware and reduce the cost of multitransform coding. In order to demonstrate the effectiveness of the method, a unified architecture is designed using the method for all of the six transforms involved in the H.264 video codec: 2D 4 × 4 forward and inverse integer transforms, 2D 4 × 4 and 2 × 2 Hadamard transforms, and 1D 8 × 8 forward and inverse integer transforms. Firstly, the six H.264 transform architectures are designed at a low cost using the proposed five-step operation sharing synthesis technique. Secondly, the proposed architecture-unification technique further unifies these six transform architectures into a low cost hardware-unified architecture. The unified architecture requires only 28 adders, 16 subtractors, 40 shifters, and a proposed mux-based routing network, and the gate count is only 16308. The unified architecture processes 8 pixels/clock-cycle, up to 275 MHz, which is equal to 707 Full-HD 1080 p frames/second
    corecore