Abstract. This paper presents a hardware implementation of rate control system with scene change detection for constant bit rate traffic of Motion JPEG2000. The input frames are divided into groups, each group contains one key frame, and a frame which originally is not a key frame of a group is changed to a key frame of a new group according to the result of the scene change detection. The frames are then encoded by Tier1 coding, the bit rate of the key frame in a group is allocated independently, and the bit rates of other frames in a group are allocated by rate distortion slope estimation. The bit streams are then truncated and outputted. The Verilog HDL modules for the architecture are designed, simulated and synthesized to Altera's FPGA. The result shows that the architecture proposed in this paper is correct.
Introduction
JPEG2000 is a new image coding standard. It can get high compression performance and supports a rich set of novel functions [1] . The JPEG committee extended the JPEG2000 standard to video coding which results to the Motion JPEG2000 standard [2] in 2002. In Motion JPEG2000, the frames of the sequence are encoded independently using JPEG2000. The rate control algorithm, called post-compression rate-distortion (PCRD) [3] can be used to allocate bit rate for the frames. To reduce the computation redundancy of the PCRD algorithm in lower bit rates, several rate control algorithms have been proposed for JPEG2000 [4] [5] .
The wavelet transformation and Tier1 coding in JPEG2000 are heavy computation and special hardware should be designed to implement them, the rate control of the JPEG2000 can be implemented in hardware and the whole JPEG2000 encoding procedure can be implemented in one chip as described in reference [6] and [7] . The rate control can also be implemented in software, but another microprocessor or MCU must be available and the software must be designed to cooperate with the hardware, and the coding system becomes complicated. In order to implement the whole Motion JPEG2000 coding procedure in one chip, this paper focus on hardware implementation of the rate control system for Motion JPEG2000.
In order to improve the quality of encoded video near scene changes, the scene change detection method is introduced to rate control system for MPEG2 [8] and H.264 [9] , and the group of pictures (GOP) is regrouped according to the result of the scene change detection. In Ref. [10] , a rate control algorithm with scene change detection for Motion JPEG2000 is also proposed. In reference [11] , a rate control algorithm based on interframe correlation for constant bit rate traffic of motion JPEG2000 is proposed. The input frames are divided into fixed groups, and the first frame of each group is a key frame. The key frames are encoded independently, and the rest of frames in each group, named common frames, are encoded by rate distortion slopes estimation, the rate distortion slopes of the previous frame are used to estimate the rate distortion slopes of the current frame. To improve the accuracy of the rate distortion slopes estimation when scene change occurs among the common frames, we present a hardware implementation of rate control system with scene change detection for constant bit rate traffic of Motion JPEG2000. The input frames are divided into groups as in reference [11] , and the first frame in each group is the key frame. During the coding procedure, once a scene change is detected and the frame is not a key frame, the group is then broken, the frame where the scene change detected becomes a key frame, and a new group starts from this frame. The bit rate of the key fame is allocated by the minimal slop discarding methods [12] , and the rest of the fames are encoded by rate distortion slopes estimation. The Verilog HDL modules for the rate control system are designed, simulated and synthesized to Altera's FPGA. The Altera's embedded logic analyzer is used to debug the system, and the result shows that the architecture designed in this paper is correct. This paper is organized as follows. The architecture of the rate control system is presented in section 2. The architecture for scene change detection is designed in section 3. In section 4, the Tier1 encoding with rate distortion estimation is proposed. The rate control and bit stream truncation are given in section 5.The rate control system is then implemented with FGPA in section 6. A conclusion is present in section 7.
Architecture of rate control for Motion JPEG2000
The architecture of the rate control system for Motion JPEG2000 is show in Fig.1 . The input frames are divided into groups, each groups contains eight frames, and the first frame in each group is the key frame. Under the control of the controller module, the frames are inputted to the scene change detection module to determine the frame inputted is a scene change frame or not, and once a scene change is detected and the frame originally is not a key frame, the group is then broken, the frame where the scene change detected becomes a key frame, and a new group starts from this frame. Each frame of the group is then discrete wavelet transformed (DWT) by the DWT module, the wavelet coefficients are then encoded by Tier1 coding module, the bit stream produced by tier1 coding is collected by the bit stream buffer. The rate control module is used to allocate rate bits for each frame of the group, and the bit stream is then truncated by the bit stream truncation module to produce the final bit stream. The architecture for rate control system The controller is implemented in state machine. The architecture for scene change detection module, Tier1 coding with rate distortion estimation, rate control and bit stream truncation will be given in the following sections. The DWT module is not included in this paper, for hardware implementation of DWT, please refer to the reference paper [13] .
Architecture for scene change detection
The scene change detection is based on the fact that when the adjacent two frames belong to the same scene, the correlation between the two frames is high, and the difference of average luminance signal between the two frames is less than a threshold, and otherwise the two frames belong to different scenes [8] . The averages of the Y signals of the frames are used in scene change detection. The scene change detection procedure is divided into 3 steps. 1) Set AvgY(i) to the average value of the Y signals of the frame i. 2) Set Diff(i) to the absolute value of the difference between AvgY(i) and AvgY(i-1). If i is equal to zero, the Diff(i) is set to AvgY(i).
3) If Diff(i) is greater than the threshold Diff th , then a new scene change is detected. The threshold value Diff th is an empirical value and is set to 5. In order to evaluate the scene change detection procedure described above, as in reference [8] and [9] , six standard sequences, akiyo (20 frames), container (30 frames), foreman (20 frames), hall (40 frames), news (10 frames) and speedway (20 frames) are combined together to create a test sequence. The test sequence is in CIF format and there are 6 scene changes in total 140 frames. The scene change detection is evaluated with C program and 6 scene changes are correctly detected as showed in Fig.2 . Fig. 2 . The scene change detection result The architecture for scene detection is showed in Fig.3 . Under the control of the scene_chg controller, the Addr_gen module generate the address of the data of the frames, the data is then read from the frame buffer. On the same time, the Accum_Y module is used to accumulate the value of the Y signals. Once a whole frame data is read, the average value of the Y signals AvgY (i) is calculated and stored in the register, the frame counter in frame_cnt module is increased by one. Then the average value of the AvgY(i-1) of the previous frame is read from register, and the difference Diff(i) between the two adjacent frames is outputted from the Accum_diff module. The threshold value Diff th is read from the threshold register, and then the Diff(i) and Diff th are send to the scene_chg module, if the Diff(i) is greater than Diff th , a new scene change is detected and the signal Scene_chg goes high. 
Tier1 encoding with rate distortion estimation
In paper [14] we proposed a hardware architecture for Tier1 coding, the architecture contains two coding blocks, the bit plane coding and arithmetic coding. The bit planes of the code blocks are encoded with three coding passes, i.e. the significant propagation pass (SPP), the magnitude refinement pass (MRP) and the clean-up pass (CUP). The symbols produced by the bit plane coding are then coded by arithmetic coding. Please ref to the paper [14] for detailed information about the structure for Tier1 coding.
To implement rate control, two variables must be available for calculating the rate distortion slopes, one is the increase of the number of code bytes ΔR, and another is the decrease of distortion ΔD. In this paper, we add two functions to the architecture of Tier1 coding, one is for the increase of the number of code bytes, and another is for distortion estimation.
A counter is used to count the number of code bytes generated by the coding passes. The number of code bytes are then stored in the rate distortion table, the increase of the number of code bytes ΔR for the coding passes are then calculated and stored in buffer.
The distortion information for three coding passes is provided by the distortion estimation module. The structure for this module is showed in Fig.4 . According to the bit plane information and the type of the coding passes, the distortion information is read from one of the two distortion tables. The significant coding distortion table is for SPP and CUP, and the magnitude coding distortion table is for MRP. The distortion are then accumulated by the adder and then registered in the D flip flop. The distortion FD outputted from D flip flop is used to calculated the decrease of distortion ΔD, the ΔD is also stored in the buffer. Once the ΔR and ΔD are ready, the rate distortion slope is calculated, and stored in the rate distortion table too.
Fig. 4. The structure for distortion estimation

Rate control and bit stream truncation
In this paper, we present a rate control system with scene change detection for constant bit rate traffic of Motion JPEG2000. The input frames are divided into groups, each group contains eight frames, and the first frame in each group is the key frame. During the coding procedure, once a scene change is detected and the frame is not a key frame of the group, the group is then broken, the frame where the scene change detected becomes a key frame, and a new group starts from this frame. According to the target bit rate, the bit rate for the different subbands and coding passes of the key frames is allocated independently by the minimal slope discarding method [12] . The minimal slope discarding method is a fast and efficient rate distortion optimization method and suitable for hardware implementation, and the compression performance of this method is compatible to the PCRD algorithm. The rest frames in each group are encoded by rate distortion slopes estimation, the optimal truncation point of the previous frame is used to estimate the truncation point of the current frame. Once the frame is coded, the actual value of the truncation point is used to update the estimated truncation point of the frame.
The test sequence created in section 3 is used to evaluate the performance of the rate control system, the test sequence is encoded in the constant bit rate of 1.2bpp, the PSNR of the frame, which originally is not a key frame in a group and then become the key frame of a new group according to the result of scene change detection, is improved 0.083dB, and the PSNR of the frame estimated by this frame is also improved.
During the coding procedure of the key frames, the rate distortion table is used to store the number of code bytes and the rate distortion slopes, as described in the previous section. The rate distortion slopes stored in this table is in descending order. Other information, such as subband index and code block index are also stored in this table. According to minimal slope discarding method, the optimal truncation point S opt is obtained, and the bit streams between the start_addr and the stop_addr are truncated and outputted from the bit stream buffer as showed in Fig.5 . The rest of the frames in each group are encoded by rate distortion slopes estimation as described above, the optimal truncation point S opt of the previous frame is used to estimate truncation point of the current frame. Once the frame is coded, the actual value of the truncation point is used to update the estimated truncation point of the frame. The bit streams truncation for those frames are similar to the one showed in Fig.5 Subband_index 
FPGA implementation of rate control system
The Verilog HDL modules for the architecture are designed and simulated. The simulation result shows that the bit streams outputted from the architecture are the same as the bit streams outputted from the C programs. The architecture is then synthesized to Altera's FPGA, the CycloneII device EP2C35F672C8 is selected, and the synthesized result shows that the clock of the system can be up to 60.5MHz. After place and route in Quartus II, the architecture is tested in system as show in Fig.6 Fig. 6 . FPGA implementation of rate control system The Altera's embedded logic analyzer Signal-TapII is used to debug the system, the waveforms captured by the by signal-TapII are compared with simulation results, the debugging results show that the rate control system runs correctly. Part of the waveform captured by signal-TapII is showed in Fig.7 . Fig.7 . Waveform captured by signal-TapII
Conclusion
In this paper, we present a hardware implementation of rate control system with scene change detection for constant bit rate traffic of Motion JPEG2000. The input frames are divided into groups, each group contains one key frame, and a frame which originally is not a key frame of a group is changed to a key frame of a new group according to the result of the scene change detection. The frames are then encoded by Tier1 coding, the bit rate of the key frame in a group is allocated independently, and the bit rates of other frames in a group are allocated by rate distortion slope estimation. The bit streams are then truncated and outputted. The Verilog HDL modules for the rate control system are designed, simulated and synthesized to Altera's FPGA. The Altera's embedded logic analyzer is used to debug the system, and the result shows that the architecture designed in this paper is efficient for rate control.
