Abstract-We have designed and implemented the LDPC decoder chip with memory-reduction method to achieve high-throughput and practical chip size. The decoder decodes (3,6)-2304bit regular LDPC codes using modified min-sum algorithm. The decoder achieves a throughput of 530Mb/s at an operating frequency of 147MHz. The chip has been fabricated in a 0.18µm, 6 metal-layer CMOS technology. The chip size is 36mm
I. Introduction
Low-Density Parity-Check (LDPC) codes achieve good performance for error collecting.
In the last few years some work has been done on designing LDPC decoder [1] [2] . The decoding algorithm exchanges the messages between the check-nodes and bit-nodes on tanner graph by performing the row operations and column operations iteratively. For future application requires highthroughput over a hundred of Mb/s, it is necessary that row operations and column operations are operated concurrently. However, double amount of memory is required to operate these concurrently compared to serial approach. We have proposed the method to improve this problem. In this method, a row operation outputs minimum absolute value, second minimum absolute value, these flags and signs instead of actual output value. We designed and implemented high-throughput LDPC decoder chip using this method. Fig.1 shows the block-structured parity check matrix of LDPC decoder that we have designed. This matrix enables the decoder to perform row and column operations partially in parallel [1] . We specified that block length is 2304 and code rate is 1/2. The diagonal lines represent entries of 1 in the matrix, other entries are 0. The matrix is composed of 3 × 6 sub-blocks of size 384. Each 384 × 384 sub-block is an identity matrix that has been shifted to the right. The shift values have been determined with a heuristic search for the best performance. Fig.2 shows the tanner graph corresponding to the parity check matrix. Each bit-node corresponds to each column in the parity check matrix. Each check-node corresponds to each row in the parity check matrix. 2304 bit- nodes are divided into 6 groups that contain 384 nodes. 1152 check-nodes are divided into 3 groups that contain 384 nodes.
II. Parity Check Matrix

III. Hardware Architecture
Fig .3 shows the block diagram of LDPC decoder. We adopted the modified min-sum algorithm to implement the decoder [2] . 6 bit Symbol Log-Likelihood-Ratio (LLR) is fed to the decoder as a input value. Symbol LLRs are stored to the input SRAMs. The decoder contains 6 SRAMs (lram1-6) with 72bit, 64 words for these input values. An input value of same group corresponding to group of bit-nodes is stored to a SRAM. 12 input values are stored to a address (word). The input SRAMs store 2 blocks of code. A block is stored for next decoding while other block is being decoded.
The decoder contains 12 × 3 = 36 Check-node Function Units (CFUs) so that 12 row operations of each group of check-nodes can be operated concurrently. Fig.4 shows the block diagram of CFU. A CFU outputs 5bit minimum absolute value, 5bit second minimum absolute value, 6 flag-bits to select minimum values and 6 sign-bits instead of actual alpha values. Flag-bits and sign-bits are stored to alpha SRAMs. The LDPC decoder contains Core   aram11  aram12  aram13  aram14  aram15  aram16   aram21  aram22  aram23  aram24  aram25  aram26   aram31  aram32  aram33  aram34  aram35  aram36   bram11  bram12  bram13  bram14  bram15  bram16   bram21  bram22  bram23  bram24  bram25 18 SRAMs (aram11-16, aram21-26 and aram31-36) with 24bit, 32 words for flag-bits and sign-bits. Minimum absolute value and second minimum absolute value are store to shift-registers with 12 × (5 + 5) = 120 bit-width.
LDPC Decoder
The decoder also contains 12 × 6 = 72 Bit-node Function Units (BFUs) so that 12 column operations of each group of bit-nodes can be operated concurrently. Fig.5 shows the block diagram of BFU. A BFU outputs 6 bit beta values and these values are stored to beta SRAMs. The decoder contains 18 SRAMs (bram11-16, bram21-26 and bram31-36) with 72bit, 32 words for these values.
The decoder contains 6 SRAMs (cram1-6) with 12bit, 32 words for decoded values. Parity check module checks decoded values whether the result of decoding is correct or not. Parity check and decoding operate independently. All of SRAMs are dual-port memory to decode the code consecutively. The LDPC decoder takes 384 / 12 = 32 cycles to complete an iterative decoding. Fig.6 shows microphotograph of LDPC decoder. A LDPC decoder chip has been fabricated in a 0.18µm, 6 metal-layer CMOS technology. Chip die size is 6.0mm × 6.0mm = 36mm
IV. Chip Fabrications
2 . 48 dual-port SRAMs occupy 78.4% of the total synthesized area. Gate count of decoder core is 206,343 gates. The decoder achieves throughput of 530Mb/s and power consumed is 3.6W at an operating frequency of 147MHz with 10 iterative decoding. 48 SRAMs consume 87.8% of the total power. The throughput is comparable to appeared in [2] , where the throughput is 127Mb/s at an operationg frequency of 121MHz. Table I shows the specifications of LDPC decoder chip.
V. Summary and Conclusions
We have designed and implemented LDPC decoder chip with memory-reduction method. The decoder achieves throughput of 530Mb/s. This architecture will be applied to future application requires high-throughput.
