27 research outputs found

    Network-on-Chip Based H.264 Video Decoder on a Field Programmable Gate Array

    Get PDF
    This thesis develops the first fully network-on-chip (NoC) based h.264 video decoder implemented in real hardware on a field programmable gate array (FPGA). This thesis starts with an overview of the h.264 video coding standard and an introduction to the NoC communication paradigm. Following this, a series of processing elements (PEs) are developed which implement the component algorithms making up the h.264 video decoder. These PEs, described primarily in VHDL with some Verilog and C, are then mapped to an NoC which is generated using the CONNECT NoC generation tool. To demonstrate the scalability of the proposed NoC based design, a second NoC based video decoder is implemented on a smaller FPGA using the same PEs on a more compact NoC topology. The performance of both decoders, as well as their component PEs, is evaluated on real hardware. An analysis of the performance results is conducted and recommendations for future work are made based on the results of this analysis. Aside from the development of the proposed decoder, a major contribution of this thesis is the release of all source materials for this design as open source hardware and software. The release of these materials will allow other researchers to more easily replicate this work, as well as create derivative works in the areas of NoC based designs for FPGA, video coding and decoding, and related areas

    An energy-aware system-on-chip architecture for intra prediction in HEVC standard

    Get PDF
    High resolution 4K and 8K are becoming the more used in video applications. Those resolutions are well supported in the new HEVC standard. Thus, embedded solutions such as development of dedicated ystems-On-Chips (SOC) to accelerate video processing on one chip instead of only software solutions are commendable. This paper proposes a novel parallel and high efficient hardware accelerator for the intra prediction block. This accelerator achieves a high-speed treatment due to pipelined processing units and parallel shaped architecture. The complexity of memory access is also reduced thanks to the proposed design with less increased power consumption. The implementation was performed on the 7 Series FPGA 28 nm technology resources on Zynq-7000 and results show, that the proposed architecture takes 16520 LUTs and can reach 143.65 MHz as a maximum frequency and it is able to support the throughput of 3840×2160 sequence at 90 frames per second

    Highly parallel HEVC decoding for heterogeneous systems with CPU and GPU

    Get PDF
    The High Efficiency Video Coding HEVC standard provides a higher compression efficiency than other video coding standards but at the cost of an increased computational load, which makes hard to achieve real-time encoding/decoding for ultra high-resolution and high-quality video sequences. Graphics Processing Units GPU are known to provide massive processing capability for highly parallel and regular computing kernels, but not all HEVC decoding procedures are suited for GPU execution. Furthermore, if HEVC decoding is accelerated by GPUs, energy efficiency is another concern for heterogeneous CPU+GPU decoding. In this paper, a highly parallel HEVC decoder for heterogeneous CPU+GPU system is proposed. It exploits available parallelism in HEVC decoding on the CPU, GPU, and between the CPU and GPU devices simultaneously. On top of that, different workload balancing schemes can be selected according to the devoted CPU and GPU computing resources. Furthermore, an energy optimized solution is proposed by tuning GPU clock rates. Results show that the proposed decoder achieves better performance than the state-of-the-art CPU decoder, and the best performance among the workload balancing schemes depends on the available CPU and GPU computing resources. In particular, with an NVIDIA Titan X Maxwell GPU and an Intel Xeon E5-2699v3 CPU, the proposed decoder delivers 167 frames per second (fps) for Ultra HD 4K videos, when four CPU cores are used. Compared to the state-of-the-art CPU decoder using four CPU cores, the proposed decoder gains a speedup factor of . When decoding performance is bounded by the CPU, a system wise energy reduction up to 36% is achieved by using fixed (and lower) GPU clocks, compared to the default dynamic clock settings on the GPU.EC/H2020/688759/EU/Low-Power Parallel Computing on GPUs 2/LPGPU

    High performance HEVC and FVC video compression hardware designs

    Get PDF
    High Efficiency Video Coding (HEVC) is the current state-of-the-art video compression standard developed by Joint collaborative team on video coding (JCT-VC). HEVC has 50% better compression efficiency than H.264 which is the previous video compression standard. HEVC achieves this video compression efficiency by significantly increasing the computational complexity. Therefore, in this thesis, we proposed a low complexity HEVC sub-pixel motion estimation (SPME) technique for SPME in HEVC encoder. We designed and implemented a high performance HEVC SPME hardware implementing the proposed technique. We also designed and implemented an HEVC fractional interpolation hardware using memory based constant multiplication technique for both HEVC encoder and decoder. Future Video Coding (FVC) is a new international video compression standard which is currently being developed by JCT-VC. FVC offers much better compression efficiency than the state-of-the-art HEVC video compression standard at the expense of much higher computational complexity. In this thesis, we designed and implemented three different high performance FVC 2D transform hardware. The proposed hardware is verified to work correctly on an FPGA board
    corecore