Abstract-The paper investigates the application of lowdensity parity-check (LDPC) codes to digital subscriber-line (DSL) transmission systems that employ discrete multitone modulation. A family of linear-time encodable binary LDPC codes that are well-suited for DSL transmission is introduced. Encoding and symbol mapping for multilevel modulation are described. Simulation results show that even under tight latency constraints good net coding gains can be achieved. Implementation complexity is analyzed and compared with that of trelliscoded modulation as employed in current asymmetric DSL transceivers. The incorporation of powerful LDPC coding techniques into next-generation DSL modems appears to be possible with reasonable increase in transceiver complexity.
INTRODUCTION
Low-density parity-check (LDPC) codes [1, 2] have mainly been considered for data transmission systems employing binary modulation. In many communication systems, however, multilevel modulation with more than two levels is employed to maximize the rate of information transfer under strict constraints on transmit signal bandwidth. An example is multicarrier digital-subscriber-line (DSL) transmission [3] , where symbol constellations of possibly different sizes are used for quadrature amplitude modulation (QAM) on each subcarrier. The study of LDPC coding schemes that are suitable for bandwidth-efficient modulation represents, therefore, a topic of considerable practical interest.
In this paper, we describe an LDPC-coded multilevel modulation technique and investigate its application to DSL transmission. Binary LDPC codes are employed together with multilevel-symbol mapping based on set partitioning and socalled "double Gray-code labeling." Our approach differs from that in [4] , where multilevel coding with binary LDPC component codes is proposed. We introduce a family of binary LDPC codes that offer good performance, are encodable in linear time, and do not suffer from error floors at significantly low bit-error rates. These LDPC codes can be constructed efficiently for any code rate and block size of interest for DSLs.
In current asymmetric DSL (ADSL) specifications [5] , coding is achieved by a concatenated scheme that includes an outer Reed-Solomon (RS) code and an inner trellis code. Depending on the choice of code parameters and interleaving depth, this scheme can provide a net coding gain of up tõ 5.5 dB with respect to uncoded modulation. LDPC coding, as described in this paper, is intended as a replacement of the inner trellis code with the objective of operating the ADSL link closer to its capacity limits than is currently possible. Our approach is applicable to both ADSL [6] and Very-high speed DSL (VDSL) systems.
In DSL transmission, overall delay, or latency, is a critical issue. "Voice" applications are known to demand rather low latency whereas other applications, such as video streaming, tolerate larger delays but need stronger error-correction capability. Thus, in studying new coding techniques for DSLs, trade-offs between coding gain and latency have to be well characterized. Another important issue is transceiver complexity. It is a critical parameter especially at the centraloffice access multiplexors or at remote terminals because it directly affects equipment cost and power consumption. LDPC coding is attractive for DSL transmission because it permits a wide range of trade-offs between latency, complexity, and system performance.
II. MULTICARRIER ADSL TRANSMISSION
The block diagram of Fig. 1 shows the components of a discrete-multitone (DMT)-based ADSL system that are relevant for the discussion in this paper.
Information bits, representing user data and control messages, are encoded by an outer RS code with code symbols from GF(2 8 ), convolutionally interleaved, and further encoded by an inner-coding stage. In the current ADSL standard, the inner code is a four-dimensional 16-state trellis code. Here we investigate replacing the inner trellis code by an LDPC coding scheme 1 . In either case, the encoded data are mapped into frequency-domain modulation symbols and then transformed by an inverse discrete Fourier transform (IDFT) operation to yield a frame of time-domain signals. These signals are converted from parallel to serial (P/S) form and sent over the communication channel. At the receiver, the inverse of the transmit operations takes place to recover the information bits. The "soft demapper" block shown in Fig. 1 , which is not needed in the case of trellis decoding by the Viterbi algorithm, computes soft information on LDPC code bits for subsequent soft iterative decoding.
The telephone-twisted-pair channel introduces frequencydependent signal distortion as well as several other forms of disturbance, of which crosstalk is the most important. If each DMT subchannel has sufficiently narrow bandwidth, then each one independently approximates an additive white Gaussian noise (AWGN) channel with a particular signal-to- 1 Outer RS coding is included in the above description because this function is mandatory in current ADSL specifications. We focus on the inner coding scheme.
noise ratio (SNR) [3] . Impulse noise represents a further source of disturbance that an ADSL system must be able to cope with. Finally, we note that narrowband interference of various origins, e.g., AM radio signals, also affects the reliability of communications in ADSL.
III. LDPC PARITY-CHECK MATRIX CONSTRUCTION
For ADSL transmission, LDPC codes with high code rates are desirable. Besides achieving high spectral efficiencies in a bandwidth-constrained transmission situation, such codes involve a smaller amount of parity checks than low-rate codes do, resulting in more tractable decoder implementations at the envisaged multi-megabit-per-second data rates. It is also desirable that the generation of the parity-check matrix involves a small amount of preprocessing operations, rendering "on-the-fly" construction of LDPC codes practical. Furthermore, linear-time encodable LDPC codes are attractive because low implementation complexity can also be achieved at the transmitter. Finally, the ability to specify the LDPC codes via a small number of parameters is critical because it minimizes the overhead during initialization, when the receiver must indicate to the transmitter which LDPC code to use for encoding. Codes that can be described by a small number of parameters are also well suited for standardization purposes. The deterministic parity-check matrix construction presented in this section meets the above objectives. The construction is based on "array codes," which are two-dimensional codes that have been proposed for detecting and correcting burst errors [7] . When array codes are viewed as binary codes, their parity-check matrices exhibit sparseness, which can be exploited for decoding them as LDPC codes using the sum-product algorithm [8] . Therefore, array codes provide the framework for defining a family of LDPC codes that lend themselves to deterministic constructions.
The array-code parity-check matrix is specified by three parameters: a prime number p and two integers k and j such that k, j [ p. It has dimensions jp % kp and is given by [7] 
where I is the p % p identity matrix and is a p % p 
. (2) The parameters j and k provide the column and row weight of A H , respectively. By construction, the matrix A H is 4-cycle free because no two rows have overlapping "1"s in more than one position.
To achieve efficient encoding, a parity-check matrix in triangular form is desirable, see, e.g., [9] . Although Gaussian elimination could be used to this end, the resulting increase in processing complexity makes this approach inattractive. Instead, we define a new matrix S H by cyclically shifting the rows of the matrix A H in a blockwise manner. The amount of cyclic shift for each block row is such that the jp % jp leftmost subblock of S H contains the identity matrix I along its diagonal: The matrix S H is 4-cycle free and has same column and row weight as A H .
To obtain the parity-check matrix in the desired form, the lower-triangular elements of the jp % jp leftmost subblock of S H are set to zero, yielding
where O is the p % p null matrix.
By successive row and column permutations, A H and S H can brought be to a form similar to that defined in [10] . Therefore, using similar counting arguments as in [10] , it can be shown that A H and S H both lead to a minimum Hamming distance of 6 min = d for j = 3 and
for j = 4 (these are the values of j of most practical interest). Furthermore, for j = 3, it can be shown that forcing S H to the triangular form H does not decrease the minimum distance of the code. This property is conjectured to also hold for j = 4, but could only be verified via an exhaustive search for codes employed in the simulations. and even. The block diagram of the multilevel encoding and symbol mapping functions is shown in Fig. 2 .
and even, the two binary b/2-tuples
b/2 , representing the real and imaginary parts, respectively, of the complex QAM symbol to be transmitted. The L-ary symbols belong to the set
Each 2 b -QAM symbol conveys b cv and b cw LDPC code bits on its real and imaginary parts, respectively; the remaining bits are uncoded. It is generally sufficient to allow up to six code bits per QAM symbol for best trade-off in terms of spectral efficiency, performance, and implementation complexity.
Symbol mapping relies on the partition of the set A into We note that Gray-code labeling is optimum in an information-theoretic sense as it leads to largest capacity for bitinterleaved coded modulation [11, 12] . Intuitively, from the observation of a noisy symbol, the most reliable soft information on each underlying bit can be generated if Gray-code labeling is used because here the variation of a symbol value between two adjacent levels corresponds to flipping a single bit only. Gray-code labeling is thus adopted for the LSBs on which the soft demapper needs to generate reliability information. Furthermore, as the uncoded MSBs are obtained via simple thresholding at the receiver, labeling those bits with a (separate) Gray code within each subset allows lowering the bit-error rate on the MSBs.
Let now y denote the real part of a noisy received signal:
and n an AWGN sample with variance can be employed. For a practical implementation, it is not necessary to include all the terms in the summations in Eq. (7) . Given a received signal
, it is usually sufficient to determine the two closest nominal symbols l A , and include only those in the summation terms. If the received signal does not fall within the constellation boundaries, i.e.,
, then the APP is set to 1 or to 0, depending on the symbol found at the constellation edge. In this way, the computational effort for soft demapping is not only reduced but also made essentially independent of the constellation size L.
The (approximate) channel APPs generated in this manner are finally used in the sum-product algorithm (SPA) [1, 2] for soft iterative decoding. Note that various simplifications of the SPA have been proposed in the literature. For example, the simplified algorithm presented in [13] operates entirely in the log-likelihood-ratio domain and offers a substantial reduction in complexity with essentially the same performance as the full SPA.
V. IMPLEMENTATION COMPLEXITY
In this section, we compare, for a particular example, the implementation complexity of the proposed LDPC coding scheme with that of trellis-coded modulation (TCM) as specified in [5] .
A. Encoding Complexity
Let us consider a DMT system with 200 tones (subchannels) and 16 QAM on each tone. As computational complexity for each trellis-encoding step amounts to 7 XOR operations for TCM, a total complexity of 100 × 7 = 700 XOR operations per DMT symbol is obtained. For LDPC coding, we need a code of length 200 × 4 = 800 bits in this case. An appropriate choice is the code with j = 3, k = 25 and rate r = K/N j 0.8863, resulting in a complexity of 2127 XOR operations per DMT symbol. Therefore, the complexity of LDPC encoding is about three times that of TCM encoding.
B. Decoding Complexity
Consider again the above example. The computational complexity of the trellis decoding approximately amounts to 119 additions and 4 multiplications per trellis step, not accounting for the complexity of subset decoding and the updating of survivor sequences (backtracing).
Using the algorithm in [13] , the complexity for LDPC decoding amounts to 3(k-2)+2j additions per iteration, assuming a block-parallel implementation. Furthermore for soft demapping, 8 multiplications and 6 additions are required per QAM symbol. The complexity of decoding for uncoded bits, which we did not account for, can be assumed to be similar to that of subset decoding in TCM. Using the same LDPC code parameters as above, assuming 20 iterations for soft decoding and a DMT-symbol rate of 4000 Hz, the results of Table II are obtained.
Note that if an LDPC code spanning more than one DMT symbol is used, complexity due to sum-product decoding will grow (which, fortunately, represents the less intensive part of the decoding process) whereas soft-demapping complexity will remain fixed because soft demapping is performed on a DMT-symbol basis.
C. Memory Requirements
For TCM, a memory size of about 20 × 2 × 16 = 640 words, where the factor 20 accounts for 5 constraint lengths, is needed to store the survivor sequences for 16-state Viterbi decoding.
For the above LDPC code example, the parity-check matrix has 800 × 3 = 2400 nonzero entries, the locations of which have to be stored for encoding purposes. Assume, for decoding with a fully parallel and pipelined structure, that each memory block is implemented as two buffers alternating between read and write. Then, the required memory for sumproduct decoding is 4 × 2400 = 9600 words. Clearly, longer codes will lead to more stringent memory requirements.
VI. SIMULATION RESULTS

A. Performance in AWGN
In the simulations, the full SPA is employed with the number of iterations limited to 20. We represent bit and block-error rates as a function of E b /N 0 , the ratio of energy perbit to noise power-spectral-density, and symbol-error rates as a function of the normalized SNR. Recall that for a modulation and coding scheme transmitting bit/symbol, the normalized SNR is defined as [14] 
For uncoded QAM, dB 8 . 9 SNR norm ≈ at a symbol-error rate of 10 -7 .
Figs. 3 to 5 show the bit-error rate (BER) and block-error rate (BLER) performance of three LDPC codes for binary transmission over the AWGN channel. The codes have lengths N = 529, 2209, and 4489, and assume j = 3, j = 4, and j = 4, respectively. Uncoded performance and capacity are also plotted in these figures.
The performance achieved is as good as or better than the performance of the randomly constructed LDPC codes [2] of comparable lengths and rates. Note also the absence of error floors at error rates of 10 -7 , which are of interest for ADSL. It is therefore expected that good performance will also be achieved for multilevel modulation. Figs. 6 to 8 show the symbol-error rate performance for 16, 256, and 4096 QAM over the AWGN channel using the three LDPC codes. 
B. Performance in AWGN with Latency Constraints
To determine the net coding gain as function of coding latency, we consider a DMT system with a total number of 100 or 200 tones. DMT symbols are assumed to be sent at the nominal rate of 4000 Hz, as is the case for ADSL.
The results summarized in Table III show the net coding gains in dB at a symbol-error rate of 10 -7 for different values of coding latency (no outer RS code). We did not run simulations for codes longer than 7200 bits, hence some entries in the table are not provided. The code rates were chosen in the range of 0.82 to 0.95. It can be seen that good coding gains can be achieved even for very tight latency constraints.
VII. CONCLUSIONS
LDPC codes are finding their way into a number of applications, e.g., for wireless communications and storage channels. They also offer unique advantages for DSL transmission.
The simulation results presented here show that, even under tight latency constraints, good net coding gains can be achieved by LDPC coding. Furthermore, LDPC codes do not exhibit "error floors" at the low bit-error rates of interest for DSL transmission. Another advantage is their low implementation complexity as compared, for example, to turbo codes. In fact, many implementation trade-offs are possible owing to the inherent parallelism in the sum-product algorithm, opening the way for very-low-power VLSI realizations.
Clearly, further study is needed to fully characterize the benefits of LDPC coding for DSLs, including VDSL, and to assess performance with actual loop and noise characteristics. However, the incorporation of powerful LDPC coding techniques into next-generation DSL modems appears to be attractive in terms of performance gains and also possible at only a reasonable increase in transceiver complexity. .
