118 research outputs found

    Exploring the design space of HEVC inverse transforms with dataflow programming

    Get PDF
    This paper presents the design space exploration of the hardware-based inverse fixed-point integer transform for High Efficiency Video Coding (HEVC). The designs are specified at high-level using CAL dataflow language and automatically synthesized to HDL for FPGA implementation. Several parallel design alternatives are proposed with trade-off between performance and resource. The HEVC transform consists of several independent components from 4x4 to 32x32 discrete cosine transform and 4x4 discrete sine transform.This work explores the strategies to efficiently compute the transforms by applying data parallelism on the different components. Results show that an intermediate version of parallelism, whereby the 4x4 and 8x8 are merged together, and the 16x16 and 32x32 merged together gives the best trade-off between performance and resource. The results presented in this work also give an insight on how the HEVC transform can be designed efficiently in parallel for hardware implementation

    IMPLEMENTASI HEVC CODEC PADA PLATFORM BERBASIS FPGA

    Get PDF
    High Efficiency Video Coding (HEVC) telah di desain sebagai standar baru untuk beberapa aplikasi video dan memiliki peningkatan performa dibanding dengan standar sebelumnya. Meskipun HEVC mencapai efisiensi coding yang tinggi, namun HEVC memiliki kekurangan pada beban pemrosesan tinggi dan loading yang berat ketika melakukan proses encoding video. Untuk meningkatkan performa encoder, kami bertujuan untuk mengimplementasikan HEVC codec pada Zynq 7000 AP SoC. Kami mencoba mengimplementasikan HEVC menggunakan tiga desain sistem. Pertama, HEVC codec di implementasikan pada Zynq PS. Kedua, encoder HEVC di implementasikan dengan hardware/software co-design. Ketiga, mengimplementasikan sebagian dari encoder HEVC pada Zynq PL. Pada implementasi kami menggunakan Xilinx Vivado HLS untuk mengembangkan codec. Hasil menunjukkan bahwa HEVC codec dapat di implementasikan pada Zynq PS. Codec dapat mengurangi ukuran video dibanding ukuran asli video pada format H.264. Kualitas video hampir sama dengan format H.264. Sayangnya, kami tidak dapat menyelesaikan desain dengan hardware/software co-design karena kompleksitas coding untuk validasi kode C pada Vivado HLS. Hasil lain, sebagian dari encoder HEVC dapat di implementasikan pada Zynq PL, yaitu HEVC 2D IDCT. Dari implementasi kami dapat mengoptimalkan fungsi loop pada HEVC 2D dan 1D IDCT menggunakan pipelining. Perbandingan hasil antara pipelining inner-loop dan outer-loop menunjukkan bahwa pipelining di outer-loop dapat meningkatkan performa dilihat dari nilai latency

    The AV1 Constrained Directional Enhancement Filter (CDEF)

    Full text link
    This paper presents the constrained directional enhancement filter designed for the AV1 royalty-free video codec. The in-loop filter is based on a non-linear low-pass filter and is designed for vectorization efficiency. It takes into account the direction of edges and patterns being filtered. The filter works by identifying the direction of each block and then adaptively filtering with a high degree of control over the filter strength along the direction and across it. The proposed enhancement filter is shown to improve the quality of the Alliance for Open Media (AOM) AV1 and Thor video codecs in particular in low complexity configurations.Comment: 5 page

    A unified 4/8/16/32-point integer IDCT architecture for multiple video coding standards

    Get PDF
    (4096x2048) 30fps video sequence at 191MHz working frequency, with 93K gate count and 18944-bit SRAM. We suggest a normalized criterion called design efficiency to compare with previous works. It shows that this design is 31% more efficient than previous work

    Design and Implementation of IDCT/IDST-Specific Accelerators for HEVC Standard on Heterogeneous Accelerator-Rich Platform

    Get PDF
    Having High Efficiency Video Coding (HEVC) is important for image processing, reducing bandwidth, and increasing video quality. There are different methods that can be used to implement HEVC. This thesis focuses on design and implementation of application-specific accelerators for IDCT/IDST algorithms dedicated for HEVC standard. Those algorithms are parallel-in-nature tasks which makes them suitable to be executed by heterogeneous multicore platforms. This is done using accelerators which are required for power efficient processing. In this study, Coarse-Grained Reconfigurable Arrays (CGRAs) are used for making a template for an accelerator. CGRA has one of the major roles in a Heterogeneous Accelerator-Rich Platforms (HARP) as it is capable of accelerating non-parallel loops with lower loop counts. This thesis includes various algorithms for the use of IDCT and IDST with different designs and templates, reaching a unique final architecture. The final output intended is to reach 4 points IDST together with a 4/8 points IDCT. Another feature added to the hypothesis is the use of different dimensions for the CGRA template in order to have a different type of accelerator. The many CGRAs are combined together in successive arrangement with Reduced Instructions Set Computers (RISC) over the Network-on-Chip (NoC). The aim is to study the performance of the accelerator used for the IDCT and the IDST. This can be evaluated as the data movement through NoC network along with comparison of performance of accelerator with clock cycles in order to calculate the efficiency of the system. The results show that a four point IDST and IDCT can be computed in 56 clock cycles. In addition, the 8 point IDCT can be implemented in 64 cycles. One important factor to consider during the study is the power and energy consumption which is important in this century. The dynamic power dissipation usage for the routing of data has reached a value of 4.03 mW. Whereas, the energy consumption was 1.76 μ\muJ for the 4 points system (IDCT and IDST) and 3.06 μ\muJ for the 8 points (IDCT). Processing Elements (PEs) are used for implementing the transform algorithm and units were operated at 200 MHz. Finally, these results show that 1080P image at 30 frames per second can be attained by using FPGA
    corecore