118 research outputs found
Exploring the design space of HEVC inverse transforms with dataflow programming
This paper presents the design space exploration of the hardware-based inverse fixed-point integer transform for High Efficiency Video Coding (HEVC). The designs are specified at high-level using CAL dataflow language and automatically synthesized to HDL for FPGA implementation. Several parallel design alternatives are proposed with trade-off between performance and resource. The HEVC transform consists of several independent components from 4x4 to 32x32 discrete cosine transform and 4x4 discrete sine transform.This work explores the strategies to efficiently compute the transforms by applying data parallelism on the different components. Results show that an intermediate version of parallelism, whereby the 4x4 and 8x8 are merged together, and the 16x16 and 32x32 merged together gives the best trade-off between performance and resource. The results presented in this work also give an insight on how the HEVC transform can be designed efficiently in parallel for hardware implementation
IMPLEMENTASI HEVC CODEC PADA PLATFORM BERBASIS FPGA
High Efficiency Video Coding (HEVC) telah di desain sebagai standar
baru untuk beberapa aplikasi video dan memiliki peningkatan performa dibanding
dengan standar sebelumnya. Meskipun HEVC mencapai efisiensi coding yang
tinggi, namun HEVC memiliki kekurangan pada beban pemrosesan tinggi dan
loading yang berat ketika melakukan proses encoding video. Untuk meningkatkan
performa encoder, kami bertujuan untuk mengimplementasikan HEVC codec
pada Zynq 7000 AP SoC.
Kami mencoba mengimplementasikan HEVC menggunakan tiga desain
sistem. Pertama, HEVC codec di implementasikan pada Zynq PS. Kedua, encoder
HEVC di implementasikan dengan hardware/software co-design. Ketiga,
mengimplementasikan sebagian dari encoder HEVC pada Zynq PL. Pada
implementasi kami menggunakan Xilinx Vivado HLS untuk mengembangkan
codec.
Hasil menunjukkan bahwa HEVC codec dapat di implementasikan pada
Zynq PS. Codec dapat mengurangi ukuran video dibanding ukuran asli video pada
format H.264. Kualitas video hampir sama dengan format H.264. Sayangnya,
kami tidak dapat menyelesaikan desain dengan hardware/software co-design
karena kompleksitas coding untuk validasi kode C pada Vivado HLS. Hasil lain,
sebagian dari encoder HEVC dapat di implementasikan pada Zynq PL, yaitu
HEVC 2D IDCT. Dari implementasi kami dapat mengoptimalkan fungsi loop
pada HEVC 2D dan 1D IDCT menggunakan pipelining. Perbandingan hasil
antara pipelining inner-loop dan outer-loop menunjukkan bahwa pipelining di
outer-loop dapat meningkatkan performa dilihat dari nilai latency
The AV1 Constrained Directional Enhancement Filter (CDEF)
This paper presents the constrained directional enhancement filter designed
for the AV1 royalty-free video codec. The in-loop filter is based on a
non-linear low-pass filter and is designed for vectorization efficiency. It
takes into account the direction of edges and patterns being filtered. The
filter works by identifying the direction of each block and then adaptively
filtering with a high degree of control over the filter strength along the
direction and across it. The proposed enhancement filter is shown to improve
the quality of the Alliance for Open Media (AOM) AV1 and Thor video codecs in
particular in low complexity configurations.Comment: 5 page
A unified 4/8/16/32-point integer IDCT architecture for multiple video coding standards
(4096x2048) 30fps video sequence at 191MHz working frequency, with 93K gate count and 18944-bit SRAM. We suggest a normalized criterion called design efficiency to compare with previous works. It shows that this design is 31% more efficient than previous work
Design and Implementation of IDCT/IDST-Specific Accelerators for HEVC Standard on Heterogeneous Accelerator-Rich Platform
Having High Efficiency Video Coding (HEVC) is important for image processing, reducing bandwidth, and increasing video quality. There are different methods that can be used to implement HEVC. This thesis focuses on design and implementation of application-specific accelerators for IDCT/IDST algorithms dedicated for HEVC standard. Those algorithms are parallel-in-nature tasks which makes them suitable to be executed by heterogeneous multicore platforms. This is done using accelerators which are required for power efficient processing. In this study, Coarse-Grained Reconfigurable Arrays (CGRAs) are used for making a template for an accelerator. CGRA has one of the major roles in a Heterogeneous Accelerator-Rich Platforms (HARP) as it is capable of accelerating non-parallel loops with lower loop counts. This thesis includes various algorithms for the use of IDCT and IDST with different designs and templates, reaching a unique final architecture. The final output intended is to reach 4 points IDST together with a 4/8 points IDCT. Another feature added to the hypothesis is the use of different dimensions for the CGRA template in order to have a different type of accelerator. The many CGRAs are combined together in successive arrangement with Reduced Instructions Set Computers (RISC) over the Network-on-Chip (NoC). The aim is to study the performance of the accelerator used for the IDCT and the IDST. This can be evaluated as the data movement through NoC network along with comparison of performance of accelerator with clock cycles in order to calculate the efficiency of the system. The results show that a four point IDST and IDCT can be computed in 56 clock cycles. In addition, the 8 point IDCT can be implemented in 64 cycles. One important factor to consider during the study is the power and energy consumption which is important in this century. The dynamic power dissipation usage for the routing of data has reached a value of 4.03 mW. Whereas, the energy consumption was 1.76 J for the 4 points system (IDCT and IDST) and 3.06 J for the 8 points (IDCT). Processing Elements (PEs) are used for implementing the transform algorithm and units were operated at 200 MHz. Finally, these results show that 1080P image at 30 frames per second can be attained by using FPGA
- …