Abstract-The aim of this paper is to study the bit-loading and power allocation problem in the presence of interference (Inter-carrier Interference (ICI) and Inter-Symbol Interference (ISI)) in Orthogonal Frequency Division Multiplexing (OFDM) systems. ISI and ICI significantly degrade the performance of OFDM systems and make the resource management optimized without the assumption of interference less efficient. To solve this problem, an initial solution based on the greedy approach is proposed in this paper. Then, several reduced complexity approaches, which yield a little degradation compared to the initial solution, have been developed. Simulation results presented in the context of Power Line Communication (PLC) show that the performance of proposed algorithms is tight with their upper bound. Moreover, these algorithms efficiently improve the system performance as compared to the constant power waterfilling allocation algorithm as well as maximum power allocation algorithm.
Abstract-The aim of this paper is to study the bit-loading and power allocation problem in the presence of interference (Inter-carrier Interference (ICI) and Inter-Symbol Interference (ISI)) in Orthogonal Frequency Division Multiplexing (OFDM) systems. ISI and ICI significantly degrade the performance of OFDM systems and make the resource management optimized without the assumption of interference less efficient. To solve this problem, an initial solution based on the greedy approach is proposed in this paper. Then, several reduced complexity approaches, which yield a little degradation compared to the initial solution, have been developed. Simulation results presented in the context of Power Line Communication (PLC) show that the performance of proposed algorithms is tight with their upper bound. Moreover, these algorithms efficiently improve the system performance as compared to the constant power waterfilling allocation algorithm as well as maximum power allocation algorithm.
Index Terms-Bit-loading, power allocation, ICI, ISI, power line communication, windowed-OFDM, greedy algorithm.
I. INTRODUCTION
I N the past decades, the use of PLC systems for high rate indoor broadband communications has spread rapidly. Since PLC doesn't require any new wire installation, it is economically attractive for indoor local area network (LAN). It can also be complementary with wireless LAN [1] , [2] .
PLC systems exploit the OFDM technique to combat the effect of multi-path channels with severe frequency selectivity [3] , [4] . The conventional OFDM divides the entire bandwidth into many orthogonal subcarriers and data are transmitted in parallel over these subcarriers. Therefore, it can support high transmission data rate and achieves high spectral efficiency. Unfortunately, many phenomena such as frequency offset between transmitter and receiver, insufficient guard interval or Doppler frequency shift, etc. induce inter-carrier interference (ICI) as well as inter-symbol interference (ISI) that significantly degrade the performance of OFDM systems [5] , [6] and make the resource management more complicated. Many techniques have been developed to combat the effects of ISI Manuscript received August 21, 2013 ; revised December 20, 2013 . The editor coordinating the review of this paper and approving it for publication was A. Tonello. This work is financially supported by PRACOM. T. Nhan Vo, K. Amis, and T. Chonavel are with the Signal and Communication Department of Télécom Bretagne-Institut Télécom, CNRS Lab-STICC (UNR 6285), Brest 29200, France (e-mail: {thanh.vo, karine.amis, thierry.chonavel}@telecom-bretagne.eu).
P. Siohan is with the Orange Labs, Rennes 35510, France (e-mail: pierre.siohan@orange.com).
Digital Object Identifier 10.1109/TCOMM.2014.031614.130660
and ICI, such as time domain windowing [7] , frequencydomain equalization [8] or ISI and ICI self-cancellation [9] .
Both current and next generation of PLC systems employ frequency band from 2 to 30 MHz (and over), e.g., HPAV1, HPAV2, IEEE P1901, ITU-T G.9963 [2] , [10] . In this frequency band, many radio applications such as amateur radio, urgency and military services have already been exploited. To avoid interference with those systems, a spectral mask is specified for PLC systems [10] . The IEEE P1901 standard uses the Windowed-OFDM instead of the conventional OFDM technique to adapt to the spectral mask [10] . In [11] , the Hermitian Symmetry Offset Quadrature Amplitude Modulation (HS-OQAM) is proposed for future use to increase the data rate as well as adapt to new spectral masks in Europe.
Regarding resource allocation, the bit-loading can be designed to achieve different objectives in OFDM systems, such as bit allocation, power allocation, code rate adaptation, etc. Different algorithms have been proposed among which we can enumerate: adaptive-rate algorithms which maximize the capacity under the power and bit-error rate constraints (BER) [12] , [13] ; margin-adaptive algorithms which minimize the consumed power under data rate and BER constraints [14] ; BER minimization under data rate and power constraints [15] . If the ISI and ICI are present, the system can be modeled as a Gaussian interference channel [16] . In this case, by using joint coding and decoding, the system capacity is maximized by using the water-filling algorithm derived in [16] . In the absence of joint coding and decoding, the interference is considered as noise when solving the resource allocation problem [17] , [18] , [19] , [20] , [21] . In [18] , the achievable throughput of PLC systems is maximized by combining the bit-loading algorithm and adaptive cyclic prefix. The bit-loading algorithm used in [18] relies on two simplifying assumptions. First, the conventional OFDM is taken into account to calculate ISI and ICI in PLC systems while, as mentioned above, PLC systems exploit the Windowed-OFDM instead of the conventional OFDM. The second simplification is enabled by on/off power loading and integer bit number constraint. The latter leads to bit discretization that causes non optimal power use. Actually, the discretized bit-loading can be achieved with lower power consumption than with the approach in [18] and the residual power could be exploited to increase data rate or/and transmission quality. CP design for maximizing the achievable throughput for Windowed-OFDM systems is considered in [22] . Unfortunately, the power allocation strategy in [22] is also non optimal due to the bit discretization. In 0090-6778/14$31.00 c ⃝ 2014 IEEE [18] , [22] , it is shown that the choice of a cyclic prefix (CP) length equal to the channel impulse response length makes PLC systems less efficient in terms of achievable throughput. Shorter CP evidently results in ISI and ICI, but the gain offered by shortened CP may exceed the losses caused by interference. Another approach to CP length adaptation relies on the statistical channel state information as derived in [23] , [24] . This approach causes a capacity loss when compared to the bit and power allocation with the instantaneous channel state information (CSI). In our paper, we assume that perfect instantaneous CSI is available and we use it to optimize the resource allocation. Moreover, our interest is to optimize the throughput for fixed GI length and we take into account possible interference in the bit/power allocation.
Recently, several approaches to search for the upper-bound of the achievable throughput have been studied in [19] but no practical solution of the achievable throughput maximization problem with low-complexity has been given. A solution based on the greedy approach has been developed in [25] for the multi-carrier interference channel in multi-users ADSL systems. However, it only takes into account the ICI caused by the asynchronous cross-talk effect. Moreover, due to the difficulty of exact additional power computation, the algorithm in [25] exploits the gradient information to approximate the power adjustment when modifying the number of bits on subcarriers. The algorithm detailed in [25] approximates the matrix inversion via power series expansion to reduce the complexity. This method works only if the assumption of convergence of the power series expansion is valid. In this paper, we focus on single-user Windowed-OFDM-based systems in the presence of ICI as well as ISI. Our proposed algorithms are also based on the greedy principle, which is detailed in the section III. However, a judicious iterative procedure for accurate matrix inversion calculation and an iterative bit-loading procedure exploiting the incremental power needed to transmit an additional bit are utilized instead of the approximations proposed in [25] .
The main contribution of this paper is to use the greedy principle to solve the achievable throughput maximization problem in PLC systems in the presence of interference and under power and bit-error rate constraints. For this purpose, the ISI and ICI due to the presence of insufficient cyclic prefix in PLC systems have been analyzed. Relying on the ISI and ICI analysis and the greedy principle, we propose an initial greedy algorithm to calculate the achievable throughput maximization in the presence of ISI and ICI. Then, several approaches are proposed in order to reduce the complexity with a negligible degradation compared to the initial greedy solution.
The rest of the paper is organized as follows. In Section II, the OFDM model in the presence of ISI and ICI is described. In Section III, the greedy principle and several approaches to reduce the complexity are presented. In Section IV, the achievable throughput maximization problem is formulated and an initial solution, which exploits the greedy principle is given. A reduced complexity algorithm is also proposed. Numerical results are reported in Section V. Finally, conclusions and perspectives are drawn in Section VI.
II. SYSTEM MODEL Let us consider a Windowed-OFDM system with L subcarriers used out of M , which are activated under a given spectral mask constraint. The demodulated sample on the m 0 -th used subcarrier and n 0 -th OFDM symbol is given by
where
denote the channel multiplicative factor, the symbol of interest, the ISI and ICI coefficients and the complex circularly Gaussian noise sample at the m 0 -th used subcarrier and n 0 -th OFDM symbol. We assume that the channel is block-based time invariant. Without loss of generality and for the sake of simplicity, in a block of many OFDM symbols, Eq. (1) can be re-written as
Since the number of subcarriers used in practical PLC systems is quite large, we assume that the interference on a given subcarrier is normally distributed (following the central limit theorem) [3] , [26] . Different normality tests for the interference have been introduced in [3] . These tests have confirmed the validity of Gaussian distribution of the interference in practical PLC systems. Then, the signal to interference plus noise ratio (SINR) and the theoretical capacity on the m 0 -th used subcarrier are given as follows:
where P (m 0 ) is the power allocated to m 0 -th used subcarrier and Γ is the SNR gap that models the practical modulation and coding scheme for a targeted symbol-error rate (SER):
where Q −1 (x) is the inverse tail probability of the standard normalization distribution [27] .
Let us denote A use the set of used subcarriers, then #(A use ) = L, where # denotes the cardinality of A use . On a given subcarrier, the interference power generally depends on the power allocated to other used subcarriers (see Appendix) and can be written as 
III. GREEDY PRINCIPLE AND REDUCED COMPLEXITY APPROACHES In this section, we present the greedy principle for the discrete bit-loading problem. Hereafter, only the active subcarriers are taken into account except in the Appendix.
A. Greedy principle to solve the bit-loading problem
Let B, A, Q and C denote the bit-allocation vector, the allowable set of numbers of bits that correspond to available modulations on subcarriers, the objective function and the set of constraints (power constraints, data rate constraints, BER constraints). As Q and C depend on B, the corresponding optimization problem can be written as
An algorithm is said to be greedy if every decision taken at any stage is the one with the most obvious immediate advantage. That is to say it makes a locally optimal choice in the hope that successive choices will lead to a globally optimal solution. Generally, a greedy algorithm does not produce a global optimum, but nonetheless it may yield a local optimum that approximate well a global optimum [28] . In practice, the standard greedy algorithm initializes vector B = {b(m)} m=1,..,L to the null vector. Another way to initialize the bit-loading vector relies on the water-filling and the bit discretization. This approach has been exploited in [29] . In [29] , it is shown that the use of this initialization in the interference-free systems can strongly reduce the complexity with a negligible throughput loss when compared to the standard greedy algorithm. Then, the greedy process successively increases the number of bits on the subcarriers up to its upper level in A, according to a predefined cost function F, which enables to optimize Q. The cost function is practically defined according to the objective function and taking into account the constraints. For example, for the throughput maximization problem under the total power constraint, the cost function is defined as the required upgrade power [30] , [31] or as the gradient information in [25] or as the metrics of log-likelihood ratios (LLRs) for the throughput maximization under the BER constraint [32] . Generally, the cost function values depend on the subcarrier index and the iteration. We denote by F(m, k) the cost function value associated to an increase of the number of bits on the m-th subcarrier at the k-th iteration. At the k-th iteration, the greedy principle evaluates F(m, k) for all values of m and decides which subcarrier will be allocated additional bits. The number of bits on the m 0 -th subcarrier, i.e. b(m 0 ), is increased after k-th iteration if
From (10), even if F(m 0 , k) is the minimum value of F(m, k), if there exists any unsatisfied constraint, then the number of bits allocated to the subcarrier cannot be increased. In this case, to avoid meaningless loops, the cost function F(m 0 , k) is set to infinity and this subcarrier must not be considered for additional bits allocation in following iterations. The procedure stops if there is no subcarrier left to allocate additional bits.
B. Reduced complexity approaches
In many cases, although the use of the greedy principle described in Section III-A yields an efficient solution to the problem in Eq. (9), its high complexity makes it unfeasible in practice. Since the complexity depends on the computational effort to calculate the cost function and the number of iterations, it can be reduced in particular by using computationally less expensive approximation of the cost function. On the other hand, the number of iterations depends on the bit-loading vector initialization and the number of simultaneously increased subcarriers per iteration. Thus, the number of iterations can be reduced by choosing a judicious initial bit-loading vector instead of a null vector and by simultaneously processing several subcarriers per iteration. We will see that these modifications lead to reduced complexity approaches at the expense of a little degradation compared to the greedy solution. In the next sections, the greedy principle and the reduced complexity approaches are utilized to form a reduced complexity algorithm that closely solves the achievable throughput maximization problem in singleuser Windowed-OFDM systems in the presence of ISI and ICI. In the simulation part, this algorithm is applied in the context of PLC systems.
IV. ACHIEVABLE THROUGHPUT OPTIMIZATION IN THE
PRESENCE OF INTERFERENCE The achievable throughput optimization problem is difficult when taking into account ISI and ICI. In fact, even with the continuous bit allocation relaxation, the rate-adaptive problem in the presence of ISI and ICI cannot be easily solved because the term P (m), which is the power allocation on the mth subcarrier, exists both in the numerator as well as in the denominator of the throughput calculation formula (see Eq. (3), (4)). In [31] the solutions for both continuous and discrete bit-loading in the presence of ICI are determined. Nevertheless, the proposed bit-adding algorithm with null initial bit-loading vector for the discrete bit-loading remains highly complex. In [17] , the power allocation corresponding to the continuous bit-loading optimization is derived. It indicates that the mutual in-out information is not a convex function of the input power distribution and that the optimal solution must be obtained via an exhausive search that is equivalent to an NP-hard problem. In the simulation part, we give an illustration of the non-convexity of the mutual information.
Let A be the allowable set of number of bits associated to allowable modulations specified by the standard and T d be the down discretization function defined as
Then, the achievable throughput optimization problem under the power constraints in single-user Single Input Single Output (SISO)-OFDM systems can be expressed as (12) . The second constraint in (12) is to satisfy a given spectral mask. Problem (12) closely resembles the multi-user power control problem in digital subscriber lines (DSL) systems. The multi-user power control can be solved by the optimal spectrum balancing (OSB) [33] , [34] and the iterative spectrum balancing (ISB) [34] , [35] , which decompose the initial problem into smaller problems corresponding to each subcarrier.
The complexity of these algorithms depends on the number of subcarriers and their practical complexity gain is normally obtained with a significant number of subcarriers. In our case, the optimization problem is equivalent to the achievable throughput optimization problem for L users on a single subcarrier. Thus, those algorithms are not interesting for our problem, which is equivalent to a single subcarrier problem. In addition, calculating the global optimum is very complex and the problem is only solvable when it involves few variables [36] , [37] . In the following, relying on the greedy principle described in Section III-A, greedy algorithms as well as a reduced complexity algorithm for the achievable throughput maximization problem in the presence of ISI and ICI are derived.
A. Bit number -Power level relation
Firstly, we do not consider any constraint on the power allocated on each subcarrier and we search the solution for the following problem:
T the vector of number of bits allocated to subcarriers, we find the
T so that:
Then, re-writting Eq. (14) in the matrix form, we obtain:
. Equation (15) yields the solution of problem (13) alone without the constraints imposed on the power allocated to the subcarriers. However, power constraints always exist, i.e. 0 ≤ P (m) ≤ P max (m), then there are many cases where we can't find the power allocation corresponding to a given bit allocation and satisfying the set of constraints.
The monotonicity of the right-hand side of (15) has been already demonstrated in [25] where it is shown that if B 1 ≤ B 2 component-wise, then the corresponding power vectors satisfy P 1 ≤ P 2 component-wise. It means that when the number of bits on a subcarrier is increased, the resulting power levels on all subcarriers are higher than or equal to the current ones.
B. Greedy Algorithm
The standard greedy algorithm initializes B = 0, that is to say Λ = 0 and P = 0. Then it tries to successively increase the number of bits on the m-th subcarrier to its upper level according to the cost function F and derives the allocated power by using (15) . However, the difficulty of cost function construction causes the complicated implementation to this algorithm. The search for a relevant cost function for an optimal greedy algorithm is a complex problem because for every number of bits increase on any subcarrier, the power levels on all subcarriers must be changed to account for the increased interference. As mentioned above, both power constraints, i.e. total allocated power and spectral mask constraint, have to be taken into account to define the cost function. In this paper, for the sake of simplicity, we only take into account the incremental power needed to transmit an additional bit as the cost function in our proposed solutions. This cost function has been utilized to solve efficiently the achievable throughput maximization problem in OFDM systems under the total power constraint [29] , [30] , [31] . The greedy-based algorithms such as the bit-adding (or bit-filling) and bit-subtracting (or bit-removal) are pratically utilized to derive feasible solutions in many bit-loading optimization problems [32] , [38] , [39] .
1) Standard Algorithm: Let us denote by P k the power allocation and by Λ k the corresponding matrix (Eq. (14)) after iteration k. For an additional number of bits on the m-th subcarrier, Λ (m) k+1 and P (m) k+1 denote the new matrix and power allocation vector updated accordingly. Let S(A) denote the sum of all elements of vector A. Then, the incremental power needed to transmit an additional bit on the m-th subcarrier at iteration (k + 1), is chosen as a cost function and defined by
The number of bits on the m 0 -th subcarrier is increased at iteration
After updating the number of bits on the m 0 -th subcarrier, Λ and P become: (18) As in Section III, the procedure is repeated until there is no subcarrier left to allocate additional bits.
The complexity of this standard algorithm is high. Let us denote by A k+1 on the set of subcarriers to which we can allocate additional bits at iteration (k + 1) and
on ) due to the matrix inversions for minimum search. This algorithm becomes intractable as the number of subcarriers becomes significant.
2) Efficient computation of the cost function:
The complexity of the standard greedy algorithm is mainly due to matrix inversions in the cost function calculation. To reduce the complexity, an efficient method is proposed to solve the matrix inversion problem. From Eq. (15),
At the iteration (k + 1), if the bit allocation on the m-th subcarrier is increased by ∆b
where e m is a column vector that has m-th entry equal to 1 and the others equal to 0. Letting M
Note that M 0 = I because B = 0 or equivalently Λ 0 = 0 at the first iteration. At any iteration, we can derive the power vector P
(m)
k+1 when the number of bits on m-th subcarrier is increased. We choose the m 0 -th subcarrier to increase its number of bits as in Eq. (17) . Finally, matrix M, the power vector and the bit-allocation corresponding matrix are updated to
Due to the simple expression of e m , the complexity of ∆M
calculation is only O(2L 2 ). Thus, the total complexity at
k+1 for all the values of m. Finally, the complexity is reduced by a factor of about L/4 thanks to the proposed implementation of matrix inversion.
In the following, the greedy algorithm which uses this computation is called the proposed greedy algorithm and referred to as GR to distinguish from the standard greedy algorithm (Standard GR), which exploits the matrix inversion to calculate the cost function.
C. Reduced complexity approaches
In this section, we try to reduce the total complexity of the greedy algorithms by relying on the reduced complexity approaches proposed in Section III-B.
1) Cost function approximation: to further reduce total complexity, an approximation of the incremental power needed to transmit an additional bit on m-th subcarrier is proposed. By using Eq. (15), we obtain:
We assume that ∆Λ
k+1 ](m), i.e. the interference increase on each subcarrier due to the increase of power allocation is negligible as compared to the current interference level on this subcarrier. This assumption is valid when the power allocation becomes high. In our simulations, it is valid with significant probability (>90%). Then, we can approximate ∆P 
The required power per additional bit on m-th subcarrier is (29) where M k (:, m) is the m-th column vector of matrix M k . Note that the matrix inversion in the calculation of ∆P bit (m, k + 1) has already been calculated to derive P k :
Then, at every iteration, the number of matrix inversions to calculate is only one. Thus the complexity reduction gain at iteration (k + 1) is of about L k+1 on as compared to the standard greedy algorithm.
2) Reduction of the number of iterations:
The choice of null initial bit-allocation vector B = 0 or equivalently P = 0 and Λ = 0 involved by the greedy approach makes the number of iterations high. We can enhance the convergence rate by judiciously initializing vector B. An initial bit-loading vector that can be utilized is the bit-loading solution obtained by the water-filling followed by bit discretization. In other words, the power allocation obtained by the Constant Power Water-Filling (CPWF) [17] is used to calculate the continuous bit-loading. From the CPWF solution we perform the discrete bit-loading and we determine the corresponding power allocation applying (15) . It is exploited to judiciously initialize bit-loading and power allocation vectors.
Theorem 1. The power allocation corresponding to the bitloading initialized by CPWF algorithm followed by bit discretization always satisfies the power constraints.
Proof: Let C CP W F and P CP W F be the continuous bitloading and power allocation vectors obtained by the CPWF algorithm; B 0 and P 0 be the discrete bit-loading vector and its corresponding power allocation vector after the discretization and from the monotonicity of the right-hand side of (15) (see VI.A), we obtain: 
This demontrates that P 0 satisfies the power constraints. In the simulations, it is shown that the performance obtained by choosing the initial bit-loading vector obtained by the water-filling and bit discretization is slightly different from the one obtained by choosing a null one. The proposed greedy algorithm combined with this initialization is referred to as GR + Init in the simulations. For further complexity reduction, we tried to simultaneously increase the number of bits on K subcarriers per iteration corresponding to the K minimum values of the cost function instead of only one subcarrier per iteration as in the previous approaches. At any iteration, if the number of bits on K subcarriers cannot be increased simultaneously, then we try to increase them one by one.
3) Proposed reduced complexity algorithm: Applying the above reduced complexity approaches results in a reduced complexity algorithm (RCA). The initial discrete bit-loading vector obtained by the CPWF followed by bit discretization algorithm, the cost function approximation and the simultaneous increase of K subcarriers per iteration are utilized in this proposed algorithm. The average complexity per iteration is dominated by matrix inversion computation.
LetL on = βL, (0<β<1) be the average number of subcarriers that can be allocated additional bits at each iteration, i.e.
where N is the number of iterations. To illustrate the total complexity, the equivalent number of matrix inversions is calculated by normalizing the total complexity with the complexity of inversion matrix (the complexity of a matrix inversion is O(L 3 )). The complexity per iteration, the number of iterations and the equivalent total number of matrix inversions are shown in Table I where β 1 < β.
D. The convergence of the algorithms
All algorithms studied in this paper always converge. The proof of this convergence relies on the monotonicity of the right-hand side of (15) and bit number increment in the Greedy process. The initial power allocation vector always satisfies the power constraints. At every iteration, we try to increase the number of bits on the subcarriers or equivalently, to setup a new bit-loading vector that is greater (component wise) when compared to the current bit-loading vector. This leads to the increase of power allocation vector. However, the constrained area of the power allocation vector is bounded. Hence, all proposed algorithms converge to a final state, which depends on the initial bit-loading vector and the way the subcarriers are chosen at every iteration.
V. SIMULATIONS RESULTS
The simulation parameters are given as follows:
• Single-user SISO-PLC system with the IEEE P1901 standard.
• T s = 0.01 (µs), T 0 = 40.96 (µs) (IEEE P1901 standard).
• GI = 5.56 (µs), RI = 4.96 (µs) (IEEE P1901 standard).
• A = {0, 1, 2, 3, 4, 6, 8, 10, 12}.
• P max (m) = 1 (normalized to the spectral mask at
• Γ = 4.038, corresponding to SER = 10 −3 .
• The channel impulse response (CIR) h(t) is obtained by
the IFFT of the frequency response of Tonello model (class 2) [40] and time rectangular filtering, keeping 95% of the initial energy [41] .
• Noise model: Background noise (Esmailian model) [42] .
• Number of channel realizations for Monte-Carlo simulation: 1000. Figure 1 gives an example of CIR for Class 2 of Tonello model. The calculation of the interference power (i.e. the entries of matrix W) in the case of insufficient guard interval (GI) in PLC systems is given in the Appendix. This calculation is different from the one derived in [43] for conventional OFDM systems. The first results of interference calculation for PLC systems were presented in [11] , [22] . In this paper, we present a more detailed calculation with both filters at the transmitter and at the receiver specified in the IEEE P1901 standard to obtain the analytical expression of the interference power and then calculate the entries of matrix W.
A. Non-convexity of the problem
In this subsection, we illustrate the non-convex structure of the problem under study. The continuous version of the bitloading problem (12) is stated as The non-convexity of the problem (33) has already been mentioned in [17] . To support this result, it is quite easy to find directions in the constrained power area where non-concavity arises. For instance, letting:
where P 0 linearly varies in from 0 to 0.4 (normalized to P mask ) yields non-concavity shape of C as in Fig. 2 .
B. Simulations restricted to a reduced subcarriers subset
For the sake of simulation run-time, in the first time, we only consider the first hundred subcarriers (L = 100) in the simulations. Moreover, we take the minimum allowable value of GI (5.56 µs) defined in the IEEE P1901 standard. In this case, the effect of interference on the system performance is significant. Figures 3 and 4 illustrate the achievable throughput and total consumed power for the proposed greedy algorithm (GR), the proposed greedy algorithm with the initialization with CPWF (GR + Init), the reduced complexity algorithm (RCA) with K = 1, the CPWF algorithm before/after discretization and throughput obtained with the maximum power allocation (MPA), which allocates the maximum allowable powers to the subcarriers, i.e. P (m) = P max (m), ∀m. Moreover, the upper bound of achievable throughput is also plotted. This upper bound is obtained by omitting the constraints of spectral mask, i.e. P (m) ≤ P max (m), ∀m in the problem (12) . The efficient solution for the relaxed problem is given by the proposed greedy algorithm with the use of incremental power needed to transmit an additional bit as the cost function. The performance of GR and GR + Init algorithms in terms of achievable throughput is almost the same. The achievable throughput of the proposed RCA with K = 1 is little higher than that obtained with the GR algorithms and significantly improved as compared to the CPWF or MPA algorithm. For example, if the total exploitable normalized power is 50, the RCA with K = 1 increases the throughput by 1% as compared to the GR algorithms and it outperforms by 16% (resp. 15%) the CPWF (resp. MPA) algorithm. Moreover, it reduces by 44% (resp. 72%) the total consumed power when compared to the CPWF (resp. MPA) algorithm. Figures 5 and 6 show the comparison between the GR algorithms and the RCA with different values for K. It demonstrates that the achievable throughput of the RCA is slightly different as compared to that of the GR algorithm. The difference of achievable throughput between the GR algorithm and RCA is negligible (<1%) and thus the RCA can be exploited to calculate the achievable throughput instead of the GR algorithm. The achievable throughput is decreased and the total consumed power is increased when K is increased. It means that the use of power is less efficient when we increase K. In the case of the RCA with K = 10, the achievable throughput decreases by 0.3% while the computation cost is reduced by 99% (resp. 90%) (see Figure 7 ) when compared to the GR algorithm with null initial vector (resp. with initial vector obtained by CPWF). Note that this algorithm complexity is already strongly reduced as compared with the standard greedy algorithm that exploits the matrix inversion.
Finally, in Figures 7 and 8 , we have plotted the complexity of the different algorithms. Figure 7 shows the complexity comparison between the GR algorithms and the RCA. We can see that the initialization with CPWF algorithm reduces significantly the complexity of the GR algorithm. The RCA has reduced the complexity by >90% (resp. 80%) as compared to the GR algorithm with null initial vector (resp. GR + Init). It is also seen that the complexity of RCA is decreased when K increases, which can be explained by the diminution of the number of iterations. Neverthless, the complexity of both cases K = 5 and K = 10 is almost the same. This is due to the number of failed attempts. Remember that in the reduced complexity algorithm with K>1, at every iteration, we try to simultaneously increase the number of bits on K subcarriers. However, at any iteration, if the attempt has failed, we must try to increase those K subcarriers one by one. This increases the number of matrix inversions. Table II lists the run-time per iteration, number of iterations and total runtime for the standard greedy, proposed greedy and reduced complexity algorithms. Those values are obtained in the case of P total = 90 and with a computer equiped with a duocore processor (2x2.7 GHz), RAM 4Gb and Matlab 2009. In this table, the GR algorithm reduces the total run-time by a factor of 3.8 as compared to the standard greedy algorithm. This gain is almost the same as the run-time ratio between the direct inversion matrix calculation and the use of iterative calculation to calculate matrix inversion in Matlab. Indeed, the use of the initial bit-loading vector obtained by the CPWF algorithm reduces strongly not only the number of iterations but also the run-time per iteration due to the decrease of the average number of subcarriers for which we have to calculate the value of cost function. the standard and proposed greedy algorithms. Hence, its total run-time is significantly reduced as compared to the standard and proposed greedy algorithms.
C. Simulation for a realistic PLC system
In this section, the proposed algorithm is applied to a realistic PLC system, where 917 subcarriers are considered (L = 917). The number of channel realizations is set to 200. Figures  9, 10 and 11 show the performance of RCA algorithms with K = 5 and K = 20 when compared to the CPWF algorithm followed by bit discretization for different classes of channel (class 2, 5 and 9) with a GI of 5.56 µs. In these figures, the throughput obtained with RCA is increased by 17% (resp. 13%) as compared to that of CPWF algorithm for class 2 (resp. class 5) channel. In these cases, the interference is significant. However, for class 9 channel, the use of the CPWF and bit discretization is sufficient for an efficient bit-loading and power allocation because there is almost no interference or the interference power is insignificant as compared to the noise power. Moreover, in this case, all subcarriers have reached their maximum number of bits determined by the maximum constellation and the maximum power constraint. Thus, no additional iteration is required.
Moreover, the performance of RCA for different mandatory GI values from 5.56 µs to 47.12 µs (specified in the IEEE P1901 standard: 5.56, 7.56, 9.56, 11.56, 15.56, 19.56 and 47.12 µs) is illustrated in Figure 12 with class 2 channel and P total = 500 (normalized). In this figure, the RCA algorithms improve the achievable throughput by about 10% when compared to the CPWF + discretization. Moreover, we can observe that the GI value that maximizes the throughput coincides with the one that maximizes the capacity. This observation can be used to optimize jointly the GI value and bit/power allocation. In other words, firstly, we could use the CPWF algorithm to calculate the capacity for every value of GI specified in the IEEE P1901 standard. We would choose the GI value corresponding to the maximum capacity. Then, with this GI value, we use the RCA algorithm to determine the bit-loading and power allocation that maximize the achievable throughput.
The cumulative distribution function (CDF) of allocated number of bits and of allocated power for 100 channel realizations are plotted in Figures 13 and 14 . We can clearly see that transmitting on Class 5 channel requires less power than on Class 2 channel (mean value: 0.13 vs 0.2) while the throughput obtained in Class 5 channel is higher than that of Class 2 channel (mean value: 8.8 vs 6.2 bit).
Finally, to adapt the proposed algorithm to the periodicallytime varying PLC channels [44] , [45] , we can re-use the channel adaptation principle described in [44] . In [44] , it is assumed that the channel periodically switches among several values. We can use this block-based invariance to apply our proposed algorithm for each channel state. If the frame error rate is high, the channel estimation and the bit/power allocation must be re-operated. greedy algorithms as well as a reduced complexity algorithm. Simulation results have clearly shown that the reduced complexity algorithm is efficient, i.e. the achievable throughput loss is negligible and the complexity is significantly reduced as compared to the greedy solutions. It is also shown that the proposed algorithm is tight with its upper bound of performance and outperforms the MPA and CPWF algorithm.
We have also proposed a joint optimization of the GI value and bit/power allocation based on the proposed algorithm. Finally, as the reduced complexity algorithm is based on the greedy principle, it can be also exploited for the margin-adaptive problem with some modifications in the initialization of bit allocation vector. Therefore, it is a good candidate to solve resource management problems for single-user Windowed-OFDM systems in the presence of significant interference.
APPENDIX INTERFERENCE CALCULATION IN PLC SYSTEMS
In this section, we derive ISI and ICI formula in PLC systems. In other words, we carry out the value of W (a, b) , ∀a, b ∈ [1, L] . In this appendix, subcarrier m denotes the m-th available subcarrier. Under a given spectral mask constraint, only a part of available subcarriers is used. According to [11] , in absence of noise, the demodulated sample on the m 0 -th subcarrier and n 0 -th OFDM symbol is 
Multi-path channel impulse response; Firstly, the calculation of I g (m − m 0 , n, n 0 , τ l ) is given according to the value of τ l . There are two cases:
• τ l < GI − RI: The filters at the transmitter g(t−nT ) and at the receiver f (t−n 0 T ) as well as their relative position are illustrated in Figure 15 . In this case, we obtain 
where we used that T 0 = F −1 0 and δ u,v = 1 if u = v and null anywhere else.
• GI − RI < τ l < T: Figure 16 illustrates the relative position between both filter responses. By observing Figure  16 , we deduce that the term I g (m − m 0 , n, n 0 , τ l ) is not null if and only if n = n 0 or n = n 0 − 1. If n = n 0 : where
If 
The channel impulse response h(t) = ∑ P −1 l=0 h l δ(t−τ l ) where τ l < τ l+1 and τ IN −1 ≤ GI − RI < τ IN can be rewritten as
Replacing (39) into (35) and taking into account (36), (37) and ( 
