Abstract. This paper proposes an efficient binary arithmetic encoder hardware architecture for CABAC (Context-based Adaptive Binary Arithmetic Coding) encoding. CABAC is an entropy coding method that is used in HEVC standard. Entropy coding removes statistical redundancy and supports a high compression ratio of images. However, the binary arithmetic encoder causes a delay in real time processing and parallel processing is difficult because of the high dependency between data. The operation of the proposed CABAC BAE hardware structure is to separate the renormalization and process the conventional iterative algorithm in parallel. The new scheme was designed as a four-stage pipeline structure that can reduce critical path optimally. The proposed CABAC BAE hardware architecture was designed with Verilog HDL and implemented in 65nm technology. Its gate count is 5.68K and maximum clock frequency is 1.11GHz. It processes the 2 bins per clock cycle. Maximum processing speed increased by 22% from existing hardware architectures. And Gate count has been reduced by 31%.
Introduction
The HEVC standard was announced due to the development of better coding schemes for media [1] . HEVC compared to H.264/AVC compression ratio has been improved to be about 50% by increasing the complexity and the amount of calculation showing the difficulty of real-time processing. In this paper, we propose a hardware design of CABAC binary arithmetic encoder with high throughput. CABAC Encoder performs adaptive binary arithmetic coding by a context-based modeling method of selecting a context model for the syntax element encoded [2] . CABAC Encoder consists of Binarizer, Context Modeler and Binary Arithmetic Encoder (BAE). The binarizer converts the syntax to a binary value. The context modeler estimates context model probability using context information value around the encoding block. The binary arithmetic encoder performs encoding by using the binarized value bin and the probability value of the context modeler. Contents of this paper are as follows. Chapter 2 describes a CABAC Binary Arithmetic Encoder, Chapter 3 describes Hardware Implementation. Finally, Chapter 4 describes the results of this study.
Proposed Binary Arithmetic Encoder
The operation of the proposed CABAC BAE hardware structure is to separate the renormalization and process the conventional iterative algorithm in parallel. The new scheme was designed as a four-stage pipeline structure that can reduce critical path optimally. The existing structure outputs the bitstream through the memory. This structure outputs the number of valid bitstream and bitstream, thereby reducing the hardware area by not using the memory. Fig. 1 shows architecture of the proposed BAE. The proposed BAE generates the information bits necessary to the bitstream output while performing the renormalization. Using the information bits, the bitstream generator can simply output the bitstream. However, the generation of information bits causes a critical path by up to 7 iterative comparisons according to the number of variable used for renormalization. To reduce the critical path, a dedicated LUT can be used to reduce the operational time involved. Also, applying the structure as seen in 
Range Update
Stage 2 performs renormalization when the range of binary arithmetic coding becomes smaller than a certain range, and outputs the number of renormalization (Cnt_RenormE) and the range of MPS (rMPS) required for calculating the low value. In the regular mode of binary arithmetic coding, existing algorithms generate a critical path by repeatedly performing a maximum of 6 left shift operations until
Advanced Science and Technology Letters
Vol.141 (GST 2016) ivlCurrRange becomes 256 or more. In the proposed scheme, to solve the variable operation of renormalization, the renormalization number is calculated by finding the first '1' position from the MSB(Most Significant Bit) of ivlCurrRange, and left renormalization is performed by left shift by the number of iterations. Fig. 2 shows the renormalization flowchart of the Range. 
Low Update
The CABAC encoder generates a bitstream according to the number of renormalization. In the proposed structure, renormalization(B) is performed by leftshifting ivlLow by the number of renormalization as shown in Table 1 , and the most significant bit is set to 0, and renormalization(A) is performed while maintaining the MSB without changing the MSB. The generation of information bits for bitstream output determines bit_cnt(Number of bitstreams to be output) and bos_cnt(Number of bits whose bitstream value has not been determined) according to the number of renormalization and ivlLow. Table 2 shows the output of information bits according to the number of renormalization. ...
Bitstream Generation
Stage 4 outputs a bitstream through an information bit for generating a bitstream for the current bin. The bit generator receives Low_data (upper 7 bits of ivlLow), bos_cnt(Number of bits whose bitstream value has not been determined) and bit_cnt(Number of bitstreams to be output) and outputs a bitstream. The number of bitstream output according to the bin to be encoded is not constant. The proposed architecture reduces the hardware area by generating an output signal (valid_bit_cnt) indicating the number of variable bitstreams and outputting the bitstream without using the memory. Fig. 3 shows the structure of Bitstream Generator and Table 3 shows the table for bitstream 
Conclusion
The proposed CABAC BAE structure is a four-stage pipeline structure that can optimally reduce critical path caused by the renormalization process. And, applying the two-bins BAE architecture improves the maximum processing about 45%. Furthermore, the number of valid bitstream signal outputs for hardware area is highly reduced by not using memory. Maximum processing speed increased by 22% from existing hardware architectures. And Gate count has been reduced by about 31%.
