We introduce a class of structured LDPC codes-turbostructured LDPC (TS-LDPC) codes-composed of two subtrees connected by an interleaver. TS-LDPC codes with good girth properties are easy to design: careful design of the interleaver component prevents short cycles in its Tanner graph. We present a methodology to design TS-LDPC codes with arbitrary column weight j ≥ 2 and arbitrary girth. In addition, we describe a complexity reduced decoding algorithm. Simulation results demonstrate the good performance of TS-LDPC codes when compared to random LDPC codes of the similar size and rate.
INTRODUCTION
Low-density parity-check (LDPC) codes [1] , can be applied in numerous tasks, e.g., communication systems, magnetic recording channels. Their performance is close to the Shannon limit using iterative decoding [2] .
An LDPC code can be described by a bipartite graph called Tanner graph [3] . The length of the shortest cycle in a Tanner graph is referred to as its girth g. Short cycles in Tanner graphs tax the computing effort of the decoding algorithm and prevent it from converging to the optimal decoding result. Furthermore, reference [3] derives a lower bound on the minimum distance d min . This lower bound increases exponentially with girth. Therefore, LDPC codes with good girth properties are particularly desirable.
Recently, cyclic and quasi-cyclic LDPC codes have drawn much attention in the sense that they facilitate low-complexity encoder and decoder designs. However, they have limited girths. Tanner, [4] , proved that the girth of such codes with column weight j ≥ 3 is less than or equal to 6. This prevents the girth of (n, j, k) cyclic and quasi-cyclic LDPC codes of growing as log n log[(j−1)(k−1)] , [1] , hence performing poorly at very long code block length.
We present a new type of structured LDPC codes with arbitrary girth that are stimulated from turbo designs, which This work was supported by the Data Storage Systems Center (DSSC) at Carnegie Mellon University, and by NSF grant # ECS-0225449.
we refer to as turbo-structured LDPC codes (TS-LDPC). The Tanner graph of such codes is composed of two subtrees that are connected in a turbo-like manner by an interleaver. This structure facilitates the systematic design of LDPC codes with large girth and flexible code rates. Our turbo-structured LDPC codes are distinct from the codes in [5] that are also inspired by turbo codes in their design of concatenated tree codes. However, our TS-LDPC codes are turbo for the decoder side, while [5] is turbo for the encoder side. The resulting TS-LDPC codes and the concatenated tree codes are very different, see [6] for comments.
TS-LDPC CODES
The Tanner graph of TS-LDPC codes is composed of three components: two height-balanced trees, denoted as an uppertree T U , and a lower-tree T L , and an interlaver that connect T U and T L . The leaf-nodes of T U are bit nodes whereas the leaf-nodes of T L are check nodes. Both trees T U and T L contain h tiers (or layers). The first tier of T U contains only one check node C * -the root, as shown in Figure 1 . To match T U , we let the root of T L to be a bit node V * and connect V * to C * . The two trees are "coupled" in a turbolike manner such that many edges join the leaf-nodes of T U and T L together, see Figure 1 . The structure formed by the edges connecting the leaf-nodes of T U and the leaf nodes of T L is named the interleaver I. Since we are interested in regular LDPC codes, we let all the bit nodes have uniform degree j and all the check nodes have uniform degree k. For example, a TS-LDPC code with h = 4, j = 3, and k = 4 is depicted in Figure 1 . Ignoring possible dependency of a few rows in the parity-check matrix, the code rate is ρ = 1 − j k . We can easily choose the values of j and k to create a TS-LDPC code with the desired code rate.
INTERLEAVER DESIGN
By construction, each leaf-node in T U is connected to q = j − 1 leaf-nodes in T L . This is a one-to-q mapping, while the standard interleaver is a one-to-one mapping between elements of two sets with the same size. To get a standard interleaver, we introduce "auxiliary nodes" (solid triangles) as shown in Figure 2 to facilitate the code design. For each leaf-node in T U , we add j −1 auxiliary nodes as its children. Similarly, each leaf-node in T L has k − 1 auxiliary nodes as its descendants.
After introducing "auxiliary nodes," we notice that cycles present in the codes must contain at least four "auxiliary nodes"-two auxiliary nodes of T U and two auxiliary nodes of T L . We classify cycles into two disjoint categories: Type-I cycles contain four and only four auxiliary nodes; type-II cycles contain more than four auxiliary nodes. We will dispose of them separately. 
h/2 auxiliary nodes in T U . We address the interleaver design problem algebraically by indexing all the auxiliary nodes of T U , from 0 to
h/2 − 1 in a format we explain next that we refer to as p-q-alternate-decimal, where p = k − 1 and q = j − 1. We need h digits in the p-q-alternatedecimal indexing and number these h digits from 1 to h, starting from the rightmost one. The odd digits take values 0 to q − 1 and the even digits take values 0 to p − 1. Similarly, we index all the auxiliary nodes of
h/2 − 1 and represent all these indices in q-p-alternate-decimal format. We provide an example. With reference to the index in Figure 2 , its index X p−q in p-q-alternate-decimal form is:
Type-I Cycles: Digit-wise Reversal
We consider first how to avoid short type-I cycles. We start with a simple interleaver design-digit-wise reversal. For an index X p−q in p-q-alternate-decimal form with h digits, its digit-wise reversal interchanges the i th digit and the (h + 1 − i) th digit. We represent the digit-wise reversal operator by π d (·). For the index X p−q in equation (1), its digit-wise reversal is:
We state the advantage of the digit-wise reversal interleaver in the following theorem and omit the proof. For a detailed proof, please refer to [7] .
Theorem 1 Connecting the auxiliary nodes indexed by
X p−q in T U to the auxiliary nodes indexed by π d (X p−q ) in T L guarantees
the resulting type-I cycles is at least of length 2h (h denotes the number of tiers in T U ).
Following theorem 1, we can enlarge the length of type-I cycles by simply increasing tiers of sub-trees.
Type-II 4-Cycles
Theorem 1 prevents short type-I cycles, however, it may lead to many short type-II cycles, even type-II 4-cycles.
To avoid short type-II cycles, we propose a method called grouping and shifting. The shift S is defined to be a constant in q-p-alternate-decimal format that is added to the origi-
In equation (2),
. where div i = p if i is even and div i = q if i is odd. Similarly, we represent the digit-wise subtraction by−. Next, we divide all the auxiliary nodes of T U into k − 1 groups, grouping together those nodes with the same leftmost digit in their p-q-alternate-decimated indices. The auxiliary nodes of T L can, likewise, be classified into j − 1 different groups based also on the values of the leftmost digits in their q-p-alternate-decimated indices. We further let the shift to be the same when we connect the auxiliary nodes of T U in the same group to the auxiliary nodes of T L in the same group. Denote by S y,z the shift introduced when we connect the auxiliary nodes of T L in the y th group to the auxiliary nodes of T U in the z th group. For different y and z, the shifts S y,z may be the same or different from each other. In addition, since S y,z is not to affect the assigning of the groups, its leftmost and rightmost digits are always set to 0.
The following theorem proved elsewhere relates type-I cycles to shifts: Hence, when following theorem 2, we have (j − 1)(k − 1) free parameters S y,z to adjust to prevent short type-II cycles while in the mean time all short type-I cycles are avoided.
Theorem 2 (TYPE-I 2h-CYCLES)
Theorem 3 provides a sufficient condition on the shifts S y,z that can be used to prevent type-II 4-cycles. Figure 3 .(a), and assuming that the index for the auxiliary node a 1 is X a1 , then according to the connecting rule presented in theorem 2, the index for the auxiliary node b 1 is X b1 = π d (X a1 )+S i,m . Let δ b denote the difference between X b1 and X b2 , i.e., δ b = X b2− X b1 . Since the auxiliary nodes b 1 and b 2 are connected to the same check node B, only the rightmost digit of the q-palternate-decimal form of δ b is non-zero, all its other digits are zero. The index for the auxiliary node c 1 is
Theorem 3 (NO TYPE-II 4-CYCLES)
. Then the p − qalternate-decimal form of δ c has only one non-zero digit, its rightmost digit. The index for the auxiliary node d 1 is
The relationship between X a1 and X a2 is X a1 = X a2 + δ a . Iterating in the definition of X a2 , we have
As π d (π d (X)) = X, this leads to
In the q-p-alternate-decimal form of a shift, the leftmost and the rightmost digits are always zero. On the other hand, except for the leftmost and rightmost digits, all the digits in δ c , δ a , π d (δ b ), and π d (δ d ) are zero. Therefore, equation (4) can be split into two sub-equations:
It follows then from equation (5)
This completes the proof.
Exclude Type-II Cycles of Arbitrary Length
We observe that there are many different classes of type-II cycles. For example, all type-II cycles of length six can be divided into three classes, as shown in Figure 3 = y 2t , t = 1, 2, 3, . . . , k and z 2t = z 2t+1 ,  t = 1, 2, 3 , . . . , k − 1
Define the shift matrix S = [S y,z ] to be a matrix that collects all the shifts. From the point of view of graph theory, S y1,z1 , S y2,z2 , . . . , S y 2k ,z 2k are adjacent vertices of a closed path in the S matrix. For example, as shown in Figure 4 , 
Theorem 4 For the equivalence set with
N I = 2k, let ∆S = k u=1 S y2u,z2u − k v=1 S y2v−1,z2v−1 . Each ∆S having h digits
is free of any cycle having less than N T = 2l edges in the trees (both T U and T L ) if it contains at most
The proof is rather lengthy. We prove it elsewhere.
By choosing suitable shifts S y,z according to theorems 2-4, we can avoid all short type-I and type-II cycles up to the desired length g − 2.
Fig. 4. Two closed paths in the shift matrix S
As an illustration, we applied the above methods to construct a (6666, 3, 6) regular LDPC code, rate .5, with girth g = 10. Its structure is given by the 3333 × 6666 matrix H shown in Figure 5 . We can clearly identify T U , T L , and the interleaver component I from the constructed matrix, as labelled in Figure 5 . In this matrix, along the solid lines, there is a single 1 in each row, while along the dashed thicker diagonals there are five 1's in each row, so that per row there are six 1's. 
PERFORMANCE EVALUATION
We compare by simulation the bit error rate (BER) of the TS-LDPC codes with the BER of randomly constructed LDPC codes that are free of 4-cycles [2] in additive white Gauss noise (AWGN) channels. The codes are decoded with the sum-product algorithm [9] , and we adopt the signal to noise ratio (SNR) defined in [2] : SNR = 10 log 10 E b / 2rσ 2 where r denotes the code rate. The plot in Figure 6 shows the BER performance for a column weight j = 3 TS-LDPC code with girth 10 (solid line). For comparison, we also show the BER performance of a randomly constructed LDPC code (no 4-cycles) with column weight j = 3 (dashed line). Both codes have the same block length 6666 and the same code rate 1/2.
From Figure 6 , it can be seen that the BER performance of the TS-LDPC code is 0.12dB better than that of the random LDPC code at BER= 10 −5 while at low SNR both codes have similar error-correcting performance. According to [3] , the (6666, 3, 6) TS-LDPC code has minimum distance d min ≥ 10. Since the lower bound of d min derived in [3] is not tight, the actual d min of the (6666, 3, 6) TS-LDPC code may be much larger than 10. In the high SNR region, d min is a dominant factor in determining the code BER performance. This explains why the TS-LDPC code with girth 10 has good BER performance in the high SNR region.
FAST DECODING
The TS-LDPC codes can be effectively decoded by what we refer to as the turbo like decoding algorithm (TLDA) [8] . The TLDA is as follows:
Step 1: Initialization.
Step 2: Decode T U , using the updated information provided by I, and, after decoding, transmit the new probabilistic information to T L through I.
Step 3: Decode T L , using the updated information from I and then transmit the new probabilistic information back to T U through I.
Step 4: Compute the temporary decoding outputs. If decoding success is achieved, go to step 5. Otherwise, go back to step 2.
Step 5: End.
The message updatings of the TLDA are the same as those for the sum-product decoding algorithm [1] . We comment briefly on the advantages of TLDA. It is well known, [9] , that with a cycle-free Tanner graph, the sum-product algorithm terminates in a finite number of steps and yields minimum symbol error probability. Therefore, in isolation, the local decoding for each cycle-free component is optimal. The TLDA is still iterative: each component transmits its a posteriori probability (APP) information to the others through the interleaver and, in turn, these components use these APPs as a priori information to start their own decoding process.
We present a simulation study by 20000 Monte Carlo simulations comparing the performance of the code using TLDA and the standard sum-product decoding. This is a (6666, 3, 6) TS-LDPC code with uniform column weight j = 3 and code rate 0.5. TLDA converges faster by about 50% (smaller number of iteration steps) than the sum-product decoding algorithm.
CONCLUSION
In this paper, we propose a new class of well-structured LDPC codes-TS-LDPC codes. We presented designs of flexible code rate with arbitray column weight and arbitrary girth. TS-LDPC codes can be decoded efficiently, characteristics that make them attractive in applications, e.g., communication systems and data storage systems.
