Abstract. In this paper, we propose a hardware design of intra prediction angular mode decision for HEVC encoder. The intra prediction of HEVC includes a total of 35 modes and has better coding performance than H.264/AVC. However, high computational complexity and computation time are required to process all 35 modes. Therefore, this paper proposes a hardware structure using an efficient angular mode decision algorithm by the simple operation using the difference of the original pixel and its position. The hardware architecture of this paper reduces computation time by parallel processing from 4x4 block size to 64x64 block size. In this paper, we use a minimized arithmetic unit by determining the angular mode by predicting the direction through a simple operation unlike the existing structure. The proposed hardware architecture was designed using Verilog HDL, implemented on a 65nm technology, synthesized with Synopsys design compiler. Synthesized gate count amounted to 14.9K and the maximum operating frequency at 2GHz.
Introduction
HEVC is a video compression standard established as an international standard in April 2013. The HEVC has more than twice the coding efficiency compared to H.264/AVC, but has high complexity [1] . The intra prediction of HEVC includes a total of 35 modes and has better coding performance than H.264/AVC. However, in order to process all of the 35 modes, the computational complexity and operational time required are high. In this paper, we propose an intra prediction hardware design using an efficient angular mode decision algorithm.
Intra Prediction in HEVC
The intra prediction of HEVC is a method of predicting the current block by referring to the samples reconstructed around the current block. The intra prediction is used to eliminate spatial redundancy. The previous standard, H.264/AVC, supports a total of 9 prediction modes, while the HEVC supports a total of 35 prediction modes. In addition, HEVC supports Coding Tree Block (CTB) and uses from 4x4 block size to 64x64 block size. The HEVC intra prediction is performed in the order of reference sample pixel generation, filtering of reference sample pixels, and intra prediction sample prediction. Table 1 show the prediction modes and names used in intra sample prediction.
Table 1. Intra Prediction modes and names
Intra prediction mode 0 is the planar mode using the value and position of the reference pixel, Mode 1 is the DC mode using the average value of the reference pixel, and mode 2 to 34 are the angular mode using the directionality of the reference pixel. Figure 1 shows directionality of the angular mode. 
Fast Angular Mode Decision Algorithm
The fast angular mode decision algorithm was applied to predict direction with difference between the positions of information of original pixel to one selected angular mode. Figure 2 shows the operational methods of a vertical line in the 5x5 pixel size, as the calculation method of vertical lines. First, 5x5 pixels are separated by vertical lines, and the difference between the original pixels is calculated to obtain the position having the largest value. Then, the directionality is predicted by using the difference of the positions having the largest greatest difference of each line. Red pointer is the pixel position having the maximum difference.
Fig. 2. Directional estimation of the vertical line

Proposed Angular Mode Decision Hardware Architecture
The proposed intra prediction hardware architecture is divided into memory part and mode decision part. The memory block consists of horizontal and vertical directions and stores the original pixels. The original pixel from the memory block is used as input to the mode decision block. In the mode decision block, the mode is determined using the difference and index of the original pixel. The mode decision is performed in parallel from 4x4 block size to 64x64 block size. The proposed hardware architecture has memory for each direction for efficient memory management. Fig. 3 shows a block diagram of the proposed intra prediction angular mode decision hardware architecture.
Fig. 3. Proposed intra prediction angular mode decision block diagram
Implementation Result
The hardware which is proposed was designed using Verilog HDL. It was synthesized in 65nm technology with support from IDEC providing CAD tools. Its gate count is 14.9k and maximum operating speed is 2GHz. Gate count increased by 75%, Maximum processing speed increased from 622MHz to 2GHz prepared to best performance of existing hardware structure, Lu [2] . Table 2 compares the hardware implementation results with other structures. 
Conclusion
The hardware architecture of this paper uses a fast angular mode decision algorithm that determines the angular mode by predicting the direction using the difference and position of the original pixels. The proposed algorithm selects the mode by predicting the directionality, so it is possible to select a mode faster than the existing algorithm. In addition, since the mode is determined by a simple operation, hardware area is minimized by using a minimized arithmetic unit in hardware implementation. The advantage of the proposed hardware architecture is that mode decisions for all block sizes proceed in parallel. The mode decision is performed in parallel from 64x64 block size to 4x4 block size, which reduces the computation time. The hardware structure of this paper was designed with Verilog HDL and synthesized using 65nm process. The synthesis result shows that the gate count is 14.9K and the maximum operating frequency is 2GHz. The number of gates increased by 75% and the maximum operating frequency increased from 622MHz to 2GHz when compared with the existing hardware structure Lu.
