A heuristic optimal discrete bit allocation algorithm is proposed for solving the margin maximization problem in discrete multitone (DMT) systems. Starting from an initial equal power assignment bit distribution, the proposed algorithm employs a multistaged bit rate allocation scheme to meet the target rate. If the total bit rate is far from the target rate, a multiple-bits loading procedure is used to obtain a bit allocation close to the target rate. When close to the target rate, a parallel bit-loading procedure is used to achieve the target rate and this is computationally more efficient than conventional greedy bit-loading algorithm. Finally, the target bit rate distribution is checked, if it is efficient, then it is also the optimal solution; else, optimal bit distribution can be obtained only by few bit swaps. Simulation results using the standard asymmetric digital subscriber line (ADSL) test loops show that the proposed algorithm is efficient for practical DMT transmissions.
INTRODUCTION
Discrete multitone (DMT) is a modulation technique that has been widely used in various digital subscriber lines (xDSL), such as asymmetric digital subscriber line (ADSL) and very-high-speed digital subscriber line (VDSL), permitting reliable high rate data transmission over hostile frequency-selective channels [1, 2] . Recently, it is proposed for broadband downstream power-line communications due to its high flexibility in resources management [3] . A crucial aspect in the design of a DMT system is to allocate bits and power to the subchannels in an optimal way under various constraints. One of the problems that are of practical interest is margin maximization or transmission power minimization, also known as margin adaptive (MA) [4] .
Many optimal or suboptimal discrete bit-loading algorithms are proposed for solving the problem. Among the algorithms in which the constraint of a target bit rate is considered, the computational complexity of the Hughes-Hartogs algorithm [5] and Chow's algorithm [6] is relatively high. There are also a lot of computationally efficient algorithms, including the algorithms proposed by Piazzo [7, 8] , the algorithm of Krongold et al. [9] , and the Levin-Campello (LC) algorithms [4, 10, 11] . Researchers afterwards take into account more constraints including the transmission power spectral density (PSD) mask and the maximum allowable size of the QAM constellations [12, 13] , and a common feature of these algorithms is that they all use greedy bit-loading, either during the whole allocation process or after the initial allocation. To achieve the target rate, greedy bit-filling adds one bit at a time to the subchannel that requires the smallest additional power, while greedy bit-removal removes one bit at a time from the subchannel that requires the largest additional power. If the initial bit rate is far from the target rate, the computation load of these algorithms is heavy. In [14] , a multiple-bits loading procedure is introduced that converges faster to the optimal solution. Initially, the algorithm calculates two bit allocations, that is, loop-representative bit allocation and maximum bit rate allocation, to obtain the initial bit distribution, and then it performs multiple-bits loading for achieving the target rate. However, the extra cost paid in calculating the loop-representative bit allocation is not always helpful. When the target rate is high enough, the performance of the algorithm degrades compared to greedy bitremoval algorithm [14] .
In this paper, a heuristic optimal discrete bit allocation algorithm is proposed. The new algorithm starts from an initial equal power assignment bit distribution determined by the system PSD mask, and then employs a multi-staged bit rate allocation scheme to meet the target rate. Specially, if 2 EURASIP Journal on Advances in Signal Processing the total bit rate is far from the target rate, a multiple-bits loading procedure is used to obtain a bit allocation close to the target rate. When close to the target rate, a parallel bitloading procedure is used to achieve the target rate. This parallel bit-loading step is computationally more efficient than the conventional greedy bit-loading algorithm. The resulting bit distribution is not guaranteed to be optimal so it is necessary to perform a clean-up operation using the LC efficientizing (EF) algorithm [4] to obtain the optimal solution. The algorithm achieves exactly the same optimal solutions as the algorithm in [14] , but the computation load is on average much lower and this can be attributed to the speed up from the parallel bit-loading step.
The new bit-loading algorithm is explained in detail in Section 2. Simulation results and analysis are given in Section 3. Finally, conclusion is drawn in Section 4.
THE NEW BIT-LOADING ALGORITHM
Assume a DMT system consisting of M subcarriers. The transmission power and bit rate (in bits/symbol) of subchannel n (n = 1, 2, . . . , M) are P n and b n , respectively. Assume that each subchannel n has the pulse-response gain H n and the noise consisting of crosstalk and thermal noise modeled as additive white Gaussian noise (AWGN) with power σ 2 n , then P n is related to b n by
where CNR n = |H n | 2 /σ 2 n is the subchannel gain-to-noise ratio (CNR) of subchannel n, and Γ is the signal-to-noise ratio (SNR) gap (in dB) [4] , which is given by
where P e is the given target probability of symbol error (PSE), γ m and γ c are the SNR margin and the coding gain, respectively, and Q −1 (x) represents the inverse function of Q(x) which is given by
The MA problem considered can be stated as follows:
where B T and P T are the target bit rate and the total power budget, 1 respectively, b n is the maximum bit rate of subchannel n, and Z + represents the set of nonnegative integer. The maximum bit rate b n is given by
where b max is the maximum allowable size of the QAM constellations and b n is the bit rate determined by the maximum allowable power P n imposed by the system PSD. In practical systems, the maximum PSD of the system is typically flat over the region of the transmission bandwidth, so P n is some constant given by
where Φ is the maximum PSD of the system and F is the subchannel bandwidth. The bit rate b n is given by
where x denotes the greatest integer that is smaller than x.
The new bit-loading algorithm consists of four steps. Initially, the algorithm calculates the maximum rate bit-loading distribution. Then based on this bit distribution, the difference between the total bit rate B and the target bit rate B T is used to calculate a loading parameter a. If the difference |B − B T | is large, the loading parameter is used in a multiplebits loading procedure to add or remove the same number of bits to or from all the subcarries in a designated set to accelerate allocation. Next, when the bit difference |B − B T | is small and nonzero, a parallel bit-filling or bit-removal is used to meet the target rate. Specially, parallel bit-filling compares the transmission power increment ΔP n (b n + 1) (0 ≤ b n < b max ) of all the subcarries in a designated set, and adds one bit to each of the |B − B T | least power-consumptive subcarriers, while parallel bit-removal compares the transmission power increment ΔP n (b n ) (0 < b n ≤ b max ) of all the subcarriers in a designated set, and removes one bit from each of the |B − B T | largest power-consumptive subcarriers. The transmission power increment ΔP n (b n ) of subcarrier n is given by
Finally, since the resulting distribution is not guaranteed to be optimum, the last step is to use the EF algorithm to check whether the target rate bit distribution is efficient. If there is no movement of a bit from one subchannel to another that reduces the total transmission power, then the resulting bit distribution is efficient. If the target rate bit distribution is efficient, it is also the optimal bit distribution; else, the optimal bit distribution can be obtained by several bit swaps. The following is the detailed algorithm.
(A) Initial maximum bit rate allocation.
(1) Compute the equal power assignment discrete bit dis-
is calculated by (6) and (7).
(2) Let bit rate b n be the maximum bit rate calculated by (5) .
The total number of bits loaded in maximum bit rate allocation is B = (B) Multibit loading allocation. Let
represent the index set of the subcarriers that carry more bits and no more bits than b max , respectively, during initialization. The cardinality of 
respectively.
Multibit loading allocation, which is upper-bounded by b max and lowerbounded by zero, is performed in such a way that the resulting bit distribution is the shift version of the initial bit distribution b. Therefore, if a (a > 1) bits were to be removed from subcarrier n (n ∈ N ∼ ), then a − (b n − b max ) bits must be removed from subcarrier n (n ∈ ∼ N s ), where
N}, or the number of bits carried by subcarrier n (n ∈ ∼ N s ) should be reduced to b n − a. Following are the notations of subsets and their cardinalities that will be used below
According to the value of a and the relation among a, v, and v, several different bit allocation schemes can be determined.
(1) a = 0. Go to (1) of step (C). 2 For the case of ∼ L = 0, target bit rate allocation is performed by repeated multiple-bits loading until the value of loading parameter a, where a = B diff/ L ∼ , is zero, and then parallel bit-loading is executed for achieving the target bit rate. As the initial bit distribution is not guaranteed to be optimal without incorporating the minimum power constraint, the target rate bit distribution is not guaranteed to be efficient, so EF algorithm is employed and the following steps are executed.
(1) Find the least power-consumptive subcarrier n + in
and go back to step (1); else, the algorithm ends.
In this way, the optimal bit distribution can be obtained after very few bit swaps. In many practical situations where the PSD is flat, the optimal bit distribution is obtained after parallel bit-loading due to the discretization nature of the task. Hence, in most cases, this procedure only plays the role of checking whether the target rate bit distribution is optimal or not, and bit swaps procedure can be omitted.
SIMULATION RESULTS AND ANALYSIS
Using the new bit-loading algorithm given in the previous section, we present extensive simulation results for various standard ADSL test loops and target rates. The ADSL loops employ a duplex transmission strategy with echo canceling and the ADSL downlinks with subcarriers 7 through 255 loaded are tested. An AWGN floor of −135 dBm/Hz is assumed. For ADSL test loop T1.601#7, T1.601#9, and T1.601#13, the operating environment with 50 high bit rate DSL (HDSL) and 50 integrated services digital network (ISDN) crosstalkers is assumed. For other ADSL test loops, the environment with 1 ADSL crosstalker is assumed. The total power budget is 100 mW, the PSD mask is -40 dBm/Hz, the SNR margin is 4 dB, the coding gain is 4 dB, and the target PSE is Pe = 10 −7 . The maximum size of the QAM constellations is set at b max = 15. Table 1 gives the numerical results of corresponding parameters in a different allocation phase for ADSL test loop T1.601#9 [15] . The target rates 2864, 2714, 2563, 2111, and Table 1 include the bit difference B diff after maximum bit rate allocation, number of subtractions in performing the multiplebits loading, number of bits B diff allocated by parallel bitfilling or bit-removal, the cardinality L of the designated subchannel set in which parallel bit-filling or bit-removal is performed, and the number of bit swaps in final bit allocation adjustment. As shown in Table 1 , the number of bit swaps in each case is zero. Simulation on other ADSL test loops under various target rates also shows that the number of bit swaps is at most 3, and in most cases the number of bit swaps is zero, meaning that the bit distribution is optimal after parallel bit-loading. Figure 1 shows the bar chart of seven different bit distributions for loop T1.601#9. Bit distributions number 5 to number 1 are the optimal bit distributions corresponding to allocation scheme a = 0, a = v, v < a < v, a = v, and v < a, respectively. Bit distributions number 7 and number 6 correspond to initial equal power assignment bit distribution b and maximum bit rate distribution, respectively. To evaluate the computational efficiency of the proposed algorithm, we compare the main computation load of the proposed algorithm with that of the algorithm in [14] for ADSL test loop T1.601#7, T1.601#13, CSA#4, CSA#6, CSA#7, CSA#8, and Mid-CSA [15] , with target bit rate corresponding to 90%, 70%, 50%, 30%, and 10% of the loop's maximum bit rate. The computation load of the proposed algorithm is mainly determined by the operations in performing multiple-bits loading and parallel bit-loading, while that of the algorithm in [14] is mainly determined by the operations in performing multiple-bits loading and greedy bit-loading. For the same number of bits B diff to be allocated in the subchannel set with the same number of subchannels L, parallel bit-loading performs B diff adjustment in one step compared to the B diff greedy bit-loading steps, thus is computationally more efficient. Assume that the transmission power increment of each subchannel is obtained beforehand. Parallel bit-loading requires Table 2 shows the experimental results of the number of subtraction and/or addition in performing the multiple-bits loading, the number of bits B diff allocated by parallel bitloading or greedy bit-loading, and the cardinality L of the designated subchannel set in which parallel bit-loading or greedy bit-loading is performed. The main computation load of the two algorithms, which is calculated based on these results, depends on two kinds of operations, that is, arithmetic operation and comparison, which are represented by symbols "A" and "C" in Table 2 , respectively. The computation load of minor adjustment using the EF algorithm is low as it obtains the optimal solution with the minimum number of bit swaps. Specially, the number of bit swaps for each scenario of Table 2 is zero. The number of "A" operations for the proposed algorithm is the sum of two parts: the number of subtraction or addition for multiple-bits loading and the number of subtraction or addition B diff for parallel bitloading. The number of "A" operations for the algorithm in [14] is the sum of three parts: the number of subtraction or addition for multiple-bits loading, the number of subtraction or addition B diff for greedy bit-loading, and the number of multiplication or division B diff −1 for updating the transmission power increment. The number of "C" operations for the proposed algorithm is
, while that of "C" operations for the algorithm in [14] is (L − 1)·B diff. To facilitate comparison of the computation load of the two algorithms, the ratios of the number of operations for the algorithm in [14] to the number of corresponding operations for the proposed algorithm are also provided.
As can be seen from Table 2 , the number of "C" operations is much more than that of "A" operations, meaning that parallel bit-loading and greedy bit-loading play the most important part in determining the computation load of the proposed algorithm and the algorithm in [14] , respectively, and the basic operation of the two algorithms is compared. The smaller the value of B diff and L is, the lighter the computation load is. Obviously, the main computation load of the proposed algorithm, that is, the number of "C" operations, is much lower than that of the algorithm in [14] in most cases. So it can be expected that the proposed algorithm is faster than the algorithm in [14] except when the algorithm in [14] ends up with a low value of B diff.
Using order-statistic selection algorithm [16] , parallel bit-loading can be performed in O(L) time. As L ≤ M, the proposed algorithm is as efficient as the LC algorithms which has the computational complexity of O(M), and more efficient than the algorithms of Piazzo [8] and Krongold et al. [9] , both of which have the computational complexity of O(M · log M).
CONCLUSION
In this paper, a heuristic optimal discrete bit allocation algorithm for margin maximization in DMT systems is presented. Compared to existing multiple-bits-loading-based algorithm which calculates an initial efficient bit calculation whatever the target bit rate is, the proposed algorithm is more flexible in that it performs bit swaps only when the target bit allocation is not efficient. Compared to conventional greedy bit-loading algorithm, the introduced parallel bit-loading algorithm is computationally more efficient. Numerical results on the standard ADSL test loops show the reduced computational load of our algorithm in comparison with existing multiple-bits-loading-based algorithm. The idea of our algorithm can also be applied to bit allocation in other DMT transmission systems.
