I. INTRODUCTION
Compared with traditional analog broadcasting system, digital audio broadcasting (DAB) system enjoys superior reception quality, stronger anti-interference ability, wider coverage and higher spectrum efficiency, which is more suitable for high-speed mobile reception [1] . Currently, there are three main technical schemes for DAB systems, i.e., Eureka-147 DAB, DRM/DRM+ and HD (High Definition) Radio IBOC system [2] . The three techniques have their own advantages, disadvantages and application scenarios. In 2006, the State Administration of Radio Film and Television (SARFT) released criteria for DAB System in China. However, it is not widely promoted and applied in China due to some problems such as policy, patent, compatibility and so on. In 2013, the SARFT officially promulgated the latest DAB standards in Frequency Modulation (FM) Band-Part1 including Framing Structure, Channel Coding and Modulation. This standard is called Chinese Digital Radio (CDR) as our country DAB standard, which implements digital-analog simulcast with more flexible spectrum mode and efficient coding algorithm [3] - [5] . The CDR is expected to apply across the nation. Therefore, it is important to design and optimize the CDR standard, which has a great realistic significance.
Channel coding is the key technology for CDR baseband transmission system. As an advanced channel coding technique, the low-density parity-check (LDPC) codes have a strong ability to correct burst errors and a very low error floor, not requiring sub-carrier interleaving. Many improvements in theoretical and experimental frontiers of LDPC have been extensively studied in this century, including its theory [1] - [3] , simulation [4] , [5] and implementation [6] - [8] . The LDPC codes are much simpler than conventional cascaded channel codes in terms of the system hardware implementation. Moreover, the encoder of the LDPC code can be efficient with low hardware cost. Thus, the LDPC code is more suitable to be channel coding scheme for DAB systems as compared to Turbo cascaded codes [6] - [10] .
Thus, the CDR standard adopts the LDPC code as a channel coding scheme. The standard supports the LDPC codes of four code rates with the fixed block length of 9216 bits. However, due to the fact the non-zero elements of the parity check matrix are randomly distributed, the present designed encoder requires a huge demand for storage resources, power consumption, which pose tremendous challenges to the LDPC encoder application in CDR standard. References [11] and [12] studied LDPC encoding method, where the general structure of the LDPC code was designed but requiring the code length to be short enough. In [13] , the block quasi-cyclic LDPC decoder based on FPGA was optimized to obtain the high throughput. Reference [14] proposed memory control strategy to reuse LDPC decoder with different code rates, which economizes the decoder hardware. Aim to facilitate the hardware implementation of the encoder for CDR, we analyze the structure of the generator matrix and parity matrix of the LDPC codes in CDR by exploiting the relationship of the two matrices. Our main results are as follows:
1) The generator matrix is transformed into blocks of quasi-cyclic structure by parallel process of row and column operation, which improves the encoder throughput greatly. Then an appropriate control strategy for CDR memory is proposed and the reuse of the encoder/decoder for four codes with different rates can be allowed. 2) The LDPC encoder is implemented on Xilinx FPGA platform and verified by ModelSim simulation softwares. We can see that the proposed encoder scheme consumes low encoder resource with throughput up to 400 Mbps, which meets the requirement in CDR standard. The article is structured as follows. Section II describes the algorithm of LDPC encoding in CDR standard. Section III introduces the implementation of LDPC encoder based on CDR. Section IV illustrates the result of the FPGA simulation and analysis. Section V concludes the paper.
II. ALGORITHM OF LDPC ENCODING IN CDR STANDARD A. LDPC ENCODER FOR CDR
The CDR standard employs the LDPC codes with four code rates, respectively, i.e., 3/4, 1/2, 1/3 and 1/4. All the codes have the same codeword length of 9216 bits. The parameters of four code rates are given in Tab. 1 [5] .
The parity matrices of the four codes are similar in terms of matrix structure. Based on the check matrices, we can use the block Gauss elimination method [15] to get their corresponding generator matrices for each code. The generator matrix is then transformed into blocks of quasi-cyclic matrix by parallel row and column operations. The resulting matrix is a systematic matrix with the following structure:
where G i,j is a 256 × 256 cyclic square sub-matrix, where each row can be obtained by a loop right moving its previous row, and each column can be obtained by a loop downing its previous column,
To get the first line, we only need to move right from the last line cyclically, and to get the first column, we need to move down from the preceding column cyclically. Therefore, it is suffice to represent G i,j with one line of data elements g i,j , where g i,j is the generator polynomial of G i,j and only k ×c polynomials can constitute G q,c . According to the LDPC configuration in CDR standard, for a rate-3/4 code of the length 6912 bits, since a submatrix is of size 256 × 256, we can deduce that k = 6912 ÷ 256 = 27, and c = (9216 − 6912) ÷ 256 = 9. Similarly, the values of k and c corresponding to the different rates can computed and shown in Tab. 2.
B. LDPC ENCODING ALGORITHM
The generator matrix of LDPC codes in CDR standard can be divided into blocks of quasi-cyclic matrix. In [16] , the parallel decoding can be used for a high rate LDPC code to obtain high decoding throughput, based on which we present a similar parallel processing to construct the encoder of the LDPC code for all the code rates in CDR standard.
That is, the information sequence m is divided into k groups, and each has length of b bits. Let s denote the encoded codeword, generated by:
Here, p j denotes parity-check bits:
computed by
VOLUME 5, 2017 Let g
i,j } denote the result of the generator polynomial g i,j rotating l bits to right. Then we have
As stated above, when the information sequence
} is input serially, we can calculate p j by (3) with the constructed generator matrix accordingly. Specifically, the realization of m i G i,j can be implemented by a feedback shift register. Finally, the entire coded sequence s including the parity bit vectors p j computed by (3) can be generated once all the information bits input the encoder. This because that the c parity vectors are designed to be generated immediately in parallel, when all the information sequences input to the encoder. Therefore, the number of generating polynomials to store is k × c, instead of all. It means that the required storage in this design is only 1/256 of previous one for the LDPC encoder, which greatly reduces the use of storage resources.
III. IMPLEMENTATION OF LDPC ENCODER FOR CDR A. LDPC ENCODER SYSTEM
The LDPC codes are used in main business data channel of CDR to correct data error. The generator matrix can be computed by block Gaussian elimination on the given parity matrix. As mentioned before, the resulting generator matrix can be divided into sub-blocks of the size 256 × 256, and each sub-matrix with circulant feature can improve the coding efficiency. Fig. 1 shows the overall design diagram of LDPC based on CDR. In this paper, the LDPC encoder employs dualport Random Access Memory (RAM) to realize reading and writing. Each RAM uses a common controller to perform address initialization and the operations of reading/writing according to the different code rates. Fig. 2 describes the detailed encoding process for a rate-1/2 LDPC code with information length 4608 bits. First, the information bits input into the RAM buffer and check whether the encoding module state is busy. When encoding module is idle, the information bits input control module, which starts to load the generator polynomial, and change the encoder state to be busy. Note that the encoder operates 18 Shift Register Adder Accumulator (SRAA) circuit modules in parallel. Upon receiving one-frame information bits, the parity bits are generated and stored in memory, with parallel-serial conversion. The encoder output the information bits first and output parity bits later. The encoder state is changed from busy to idle, when a frame of data encoding is completed. We introduce all the modules in detail as follows.
B. GENERATOR MATRIX STORAGE MODULE
The generator matrix can be decomposed and then stored in Read Only Memory (ROM). Given a code rate, the encoder uses store calls to control the encoder storage and find address to load the corresponding generator polynomial [17] . The CDR standard adopts four LDPC codes of different rates, where the bit numbers to be stored are 2304, 3072, 4608 and 6912, respectively. Exploiting the characteristic of the generator matrix in CDR, the generator matrix can be represented by several bits subsequences and each has 256 bits. Thus the generator matrix can be stored in 9∼27 Roms and each stores 256 bits. In this part, we employ a parallel structure to load the generator matrix of the LDPC code and generate the parity information in the SRAA module. Because the sub-matrix is cyclic, it suffices to store the first line of the matrix. Then we can cyclically shift the first row information to obtain the whole generator matrix in SRAA module. The scheme of storage process is illustrated in Fig. 3 .
As shown in figure, we take a rate-1/4 code as an example. The first line of its generator polynomial, i.e., {G 0,0 , G 0,1 , · · · , G 0,26 } is stored in the 0 address of ROM0 to ROM26, and the first line generator polynomial, i.e., {G 1,0 , G 1,1 , , · · · , G 1, 26 } is stored in the 1 address of ROM0 to ROM26. And so on, the first row generating polynomial of
, is stored at i address of ROM0 to ROM26. By doing so, all the generating polynomials of the 1/4 code rate are placed on the 0∼8 address of the ROM0∼ROM26. Similar to the 1/4 code rate, a storage pattern of generator polynomials of the codes with the other rates can be obtained.
C. SRAA MODULE
The main module of the LDPC encoder is SRAA module, which takes up the most resources. Among the code rates supported by CDR standard, the rate-1/4 code has the most 6912 parity bits, and its generator matrix stored is the largest as well. In out hardware design, each SRAA module output 256 bits. Since the maximum number of parity bits to be stored is 6912, we requires 27 SRAA to work in parallel. Thus, we can conclude that 27 SRAA with serial input and parallel output are sufficient for the operation on the check matrix. The detailed scheme of SRAA circuit diagram can be shown in Fig. 4 .
As discussed above, the hardware of the SRAA consists of two groups of cyclic shift registers A and B, each register has 256 bit. The implementation procedure of SRAA circuit is:
• Step 1. Initialization. The content of register A and B is cleared.
• Step 2. Generator polynomial of the first sub-matrix g i,j is input to the feedback shift register B. With the first information bit m 0,0 shifted into the SRAA circuit, the product of m 0,0 g
i,j is generated at the output of the AND gates, which is then input to XOR gate. That is, the value of the register A is:
• Step 3. The next information bit m 0,1 is shifted into the SRAA circuit. The register B is shifted to the right once with the date update as g (1) 0,0 . By repeating the step 2, the value of the register A is:
• Step 4. When the information vector m 0 is input into SRAA circuit one bit by one bit, at the end of 256 clocks, the value of the register A is:
• Step 5. With the next information vector m 1 , the SRAA loads the generator polynomial of the next sub-matrix G 1,0 to the register B. Repeat the Step 2 − 4 till 256 clocks cycles, we can obtain the value in register A as:
• Step 6. Repeat
Step 5 for the whole generator matrix.
Finally we obtain the value in register A as:
By using the above method, we can get all the parity vectors.
D. OUTPUT BUFFER
According to the control signal, the output module outputs the input information bits and check bits in serial. In The output sequence of LDPC encoding, information bits and parity bits are arranged in the front and in the back. Thus, the information bits can be output directly without any caching. In this way, we can save hardware storage resources for information bits and avoid time-delay of caching information. Output module structure design is shown in Fig. 5 . The information bit are input to control module and the SRAA module for the encoding at the same time. The control module then output information bits directly according to MUX selector. Note that when the output control signal req is at high level, which indicates the frame coding is completed, the parity bits generated in parallel by SRAA are then stored in the register temp. After parallel to serial operation, MUX selector outputs the converted parity bits.
E. CONTROL MODULE
We use control module to control the data stream of various coding rate. The control module ports we design are given in Tab. 3. The work procedure of control module circuit is:
• Count the input information bits. When the count equals to the number of the bits in a frame, the counter is cleared. The count value is used to generate the relevant control signal;
• Generate read data address G_addr in ROM. At the beginning of the input, the read address is initialized depending on the value of coding rate. The value of G_addr is increased by one if the input of each 256 bits completes. Once input of a frame bits is completed, the address return to the initial value.
• Generate read flag signal store_flag. When the G_addr is generated, control module outputs the store_flag signal, which indicates that the generator matrix module can read ROM.
• Generate initialization load signal for SRAA module. When the memory module for generator matrix reads data in the address of ROM, the load signal is set to be 1. Thus the shift register in SRAA module stores the read data as the initial value.
• Generate enable req signal at parity bit output. When the counter reaches the number of one frame information bits, the req signal is set to be 1. Then control module knows that a frame coding is completed and coded parity bits are stored.
IV. FPGA IMPLEMENTATION AND SIMULATION ANALYSIS A. IMPLEMENTATION ANALYSIS
In this paper, we implement the encoder design in Verilog HDL language. The xc6slx150t-3fgg484 FPGA chip of Xilinx's ISE platform meets the requirements of the design featured by low price, high integration, low power consumption and high flexibility. The proposed design is synthesized and realized on this chip for the LDPC encoder of four rates in CDR standard. Take rate-1/2 LDPC encoder as an example, Fig. 6 shows the I/O interface of the top encoder module LdpcEncode. The detailed description for the I/O ports are given in Tab. 4, including the port name, port type, port-bit wide. The simulated waveform in the ISE platform for the (4608, 9216) LDPC encoder is shown in Fig. 7 . In Fig. 7 , the encoder read the generator vector in the ROM sequentially, which is then input to SRAA circuit to calculate the parity bits. A frame coding is completed after 18 generate vectors are input to SRAA circuit to calculate. Then the total parity bits are output. The computation for each generator vector needs 256 clocks. The total time clocks for computing parity bits is 256 × 18 = 4608. Suppose that the maximum operating frequency of the system clock is 200MHz and the code length is 9216 bits, we can see that the coding throughput can be up to about 400 Mbps, which meets the requirements of the CDR standard. The resource usage report is given in Fig. 8 . We can see that the designed encoder occupies logic resource slices at 17%, on-chip RAM resources at 13%, no using DSP. The proposed method for the encoder is rather feasible in the hardware implementation.
B. SIMULATION RESULT
According to quasi-cyclic characteristic of the LDPC encoder in CDR standard, we use the ModelSim and MATLAB softwares together to create a joint platform to verify the proposed encoder design. The block diagram is presented in Fig. 9 . We first implement the encoder based on MATLAB, where the encoding result is verified by the value of C × H T , with C being the output codeword. Second, the random bits generated by Modelsim platform are input to Matlab and our proposed encoder hardware module for encoding, respectively. Finally, comparing the results from ModelSim and Matlab simulations, we find the two results are the same. Therefore, the LDPC encoder module is implemented accurately with the encoding principle. Fig. 10 shows the BER curves of the LDPC codes with four rates in CDR. For comparison, Fig. 11 plots the BER curves of standard convolutional codes in the channels of DAB, including OFDM channels without channel coding, fast information channel (FIC), and Main Service Channel (MSC). The rate of the convolutional code is 3/8 and code length is 1152 bits. We can see that the decoding performance in CDR is superior to that in DAB. In particular, for CDR, the BER can reach 10 −6 for SNR less than 3 dB, while the BER is 10 −3 in DAB even for SNR of 5dB. Thus, it can be proved that the channel coding with CDR standard has higher coding efficiency and better performance than that in DAB standard.
The main function of CDR baseband system is the transmission of high quality audio data. The transmission quality audio files is not intuitive enough to express in the paper. Therefore, we use image data instead of audio data as the input source to illustrate the decoding performance of the system. To observe the performance in the main different traffic data channels, we sent a 300 × 300 Lena binary image as the input data source to the channels with different SNRs. Meanwhile, SNR = 5 dB, SNR = 10 dB, SNR = 20 dB are defined as the channel's weak noise, ordinary noise and strong noise. The received image is decoded and then recovered, as shown in Fig. 12 . Because the naked eye can not distinguish the resulting image clearly, we also turn it to the result in terms of error rates:
• when SNR = 5 dB, the image error rate at 10 −3 level; • when SNR = 10 dB, the image error rate at 10 −4 level; • when SNR = 20 dB, the image error rate at 10 −5 level. Thus, the case of image transmission demonstrates that the LDPC coded CDR system ensures normal communication with the error rate less than the 10 −3 [18] , [19] . Based on the result, it can be concluded that CDR system can meet the decoding requirement of image transmission. Furthermore, since the quality of audio transmission is generally superior to image transmission, CDR baseband system can also meet the requirements of digital audio transmission. 
V. CONCLUSIONS
This paper analyzes the encoding process of LDPC codes adopted by CDR standard, which was enacted in 2013. The encoding algorithm facilitating hardware implementation is designed based on the structures of parity matrix and generator matrix. We then employ a method to control and greatly save memories when realizing the LDPC codes with four rates in CDR standard. The whole encoder and decoder design is synthesized on xc6slx150t-3fgg484 of Xilinx's ISE platform in Verilog HDL language. With the help of analysis on ModelSim and Matlab, it is verified that the proposed LDPC encoder benefits from high accuracy and efficiency, with the throughput can be up to 400 Mbps. In particular, we show that the decoding BER can be as low as 10 −5 at SNR less than 2.5 dB. The comparison of CDR and DAB further demonstrates the superiority of the CDR standard, which is of great significance for the CDR research and commercial applications in the future.
