Abstract-In this paper, a generalized dual-basis envelope-dependent sideband (GDES) distortion model structure is proposed to compensate the distortion induced by transmitter leakage in concurrent multi-band transceivers with non-contiguous carrier aggregation. This model has a generalized structure that is constructed via first generating a nonlinear basis function that maps the inputs to the target frequency band where the distortion is to be cancelled, and then multiplying with a second basis function that generates envelope-dependent nonlinearities. By combining these two bases, the model keeps in a relatively compact form that can be flexibly implemented in digital circuits such as field programmable gate array (FPGA). Experimental results demonstrated that excellent suppression performance can be achieved with very low implementation complexity by employing the proposed model.
Abstract-In this paper, a generalized dual-basis envelope-dependent sideband (GDES) distortion model structure is proposed to compensate the distortion induced by transmitter leakage in concurrent multi-band transceivers with non-contiguous carrier aggregation. This model has a generalized structure that is constructed via first generating a nonlinear basis function that maps the inputs to the target frequency band where the distortion is to be cancelled, and then multiplying with a second basis function that generates envelope-dependent nonlinearities. By combining these two bases, the model keeps in a relatively compact form that can be flexibly implemented in digital circuits such as field programmable gate array (FPGA). Experimental results demonstrated that excellent suppression performance can be achieved with very low implementation complexity by employing the proposed model.
Index Terms-Behavioral model, carrier aggregation, multi-band, power amplifiers, transmitter leakage suppression.
I. INTRODUCTION
N ON-CONTIGUOUS carrier aggregation (CA) technique [1] has been proposed to effectively combine multiple frequency bands to conduct high-speed data transmission in wireless communications. To support CA operation, high-efficiency concurrent multi-band transmitters are often deployed [2] . Due to nonlinear characteristics of RF power amplifiers (PAs), distortion is normally added into the transmit signal after amplification [3] . In multi-band operation, the distortion is usually located not only near the transmission bands, but also at the intermodulation frequencies. These intermodulation frequency bands, e.g., the third-order intermodulation (IM3) bands, sometimes can overlap with the receiver bands in the frequency-division duplex (FDD) mode, as illustrated in Fig. 1 . Ideally, the duplexers shall have enough attenuation to avoid the distortion generated by the transmitter that falls into the receiver band. In practice, however, it is not easy to design such duplexers to meet the requirement. Some intermodulation products can therefore leak to the receiver band and introduce serious spurious emission to the receiver, causing significant quality degradation of the received signal. Various compensation schemes have been proposed either in transmitter (Tx) or in receiver (Rx) to resolve the problem. Because of low cost and great accuracy, digital predistortion (DPD) [4] - [10] in the transmitter has been widely employed to remove the sideband distortion. In [4] , C. Yu et al. proposed a full-bandwidth DPD method by treating the multiband signal including the sideband signal as a single signal to effectively remove the unwanted sideband distortion. In [5] , [6] , P. Roblin and J. Kim et al. proposed a frequency selective DPD method to successfully cancel the sideband separately by employing a large signal network analyzer (LSNA) to extract the device under test (DUT) information. In [7] , S. A. Bassam et al. proposed a filtering-based sideband distortion modeling technique to inject the anti-phase sideband distortion for distortion suppression. To reduce complexity, M. Abdelaziz et al. in [8] , [9] proposed simplified methods by only picking the modeling terms falling into the specified distortion bands. In [10] , Z. Fu et al. proposed a sideband compensation scheme based on evaluating and minimizing the power spectral density (PSD) of PA output signal around the pre-specified frequency. The DPD-based sideband compensation methods work well in general but they require extra bandwidths to transmit the sideband information in multi-band transmitters, which is often not desirable in many applications.
Instead of removing the distortion in Tx, some other compensation schemes [11] - [20] are realized in Rx. The main idea is to build a distortion model to generate the replica of the distortion that falls into the receiver band and subtract it from the received signal to obtain the original signal, as shown in Fig. 1 . In [11] , [12] , A. Frotzscher et al. analyzed the impact on system performance in zero-IF receiver impaired by transmitter leakage. In [13] , M. Kahrizi et al. proposed a digital method to suppress the second-order intermodulation (IM2) of Tx leakage in WCDMA direct-conversion receivers. M. Omer et al. in [14] created the replica of the sideband distortion by assuming that the frequency response of duplex filter is known, while A. Kiayani et al. in [17] proposed a method to estimate the transmitter leakage channel including both duplexer and PA. The same authors in [18] extended this method to deal with concurrent dual-band signal. In [19] and [20] , H.-T. Dabag et al. proposed an all-digital cancellation technique to mitigate the receiver desensitization in uplink CA in cellular handsets.
In [21] , we proposed a novel dual-basis envelope-dependent sideband distortion model to characterize the transmitter leakage in the receiver band in concurrent dual-band transceivers. Experimental results showed that only a very small number of model coefficients with narrowband digital signal processing are required to achieve satisfactory cancellation performance. Due to limited space, in [21] , only the basic concept and the verification of suppression of distortion in concurrent dual-band transceivers were given. In this paper, we provide a detailed analysis of transmitter leakage and give comprehensive derivations for the model development in more complex scenarios, such as 3-Carrier (3-C) CA applications. Based on the analysis, a generalized dual-basis envelope-dependent sideband (GDES) distortion model structure is then proposed to provide a uniform architecture to suppress various distortions that appeared in such systems. By employing the proposed model structure, different distortion components can be accurately characterized and compensated with the same digital circuit module. The distortion overlapping issue can also be easily resolved. Compared to the existing methods, the proposed model is in a compact format and can be easily extended to different scenarios without increasing much complexity. A generalized FPGA architecture with detailed hardware implementation is also given.
The rest of the paper is organized as follows: In Section II, the transmitter leakage analysis focusing on the 3-C CA application is provided. A generalized model structure is then proposed in Section III with FPGA implementations given in Section IV. The experimental results for the different scenarios are provided in Section V, followed by a conclusion in Section VI.
II. TRANSMITTER LEAKAGE ANALYSIS
With increasing demands for high data rates, carrier aggregation techniques will be widely employed in wireless cellular communications and the number of aggregated carriers will inevitably keep increasing in the future. As discussed earlier, the distortion generated in transmitters can leak to receiver bands and cause quality degradation of the received signals. This situation becomes worse when CA is employed. In this section, we take a 3-carrier CA scenario as an example to illustrate how the transmitter leakage is generated.
A. 3-Carrier Carrier Aggregation
Considering the frequency plan for LTE FDD mode [22] and the transmitter architecture employing CA techniques [23] , three cases of frequency allocations might be assigned for three-carrier carrier aggregation. As shown in Fig. 2(a) , in the first case, three bands are located at MHz (LTE band 5), MHz (LTE band 1), and MHz (LTE band 22). Since these three bands are spanned in a very large frequency range, multiple RF chains and power amplifiers may be employed. In this case, only the distortions near main carriers are our concern. The intermodulation products crossing the multiple bands may not cause severe problems. However, in the second case, shown in Fig. 2(b) , if three bands are located at MHz (LTE band 3), MHz (LTE band 1), and MHz (LTE band 7), the intermodulation products will spread over to nearby receiver bands which can cause problems. For example, the upper 3rd-order intermodulation product generated from band 1 and 3 is located around 2510 MHz , which overlaps with the uplink band for LTE band 7 allocation (2500 MHz-2570 MHz). In the third case, shown in Fig. 2(c) , three bands are located at MHz (LTE band 3), MHz (LTE band 2), and MHz (LTE band 1). In this case, all three bands are located over a frequency range of 400 MHz and thus it is possible to only employ one wideband PA to transmit this multi-band signal. Similar to the second case, the intermodulation products also affect receiver bands and in this case the situation becomes much more complex since not only the intermodulation products generated from two frequency bands, but also the ones generated from three frequency bands can affect receivers. For instance, one of the IM3 generated from these three bands are 1950 MHz , that overlaps with the uplink band for LTE band 1 allocation (1920 MHz-1980 MHz).
B. Derivation for Tx Leakage
Let's assume the aggregated input signal can be represented as (1) where are the baseband representations of the signals located at the carrier frequencies . If the input signal passes through a nonlinear system, the output will contain many distortion products that can spread over multiple frequency bands. Here, to simplify the derivation, a memoryless polynomial model is taken for example, that is (2) where and is the input and output, respectively and is the nonlinear order. Substituted (1) into (2), all the distortion can be obtained (3) From (3), we can see that the center frequency for each distortion item can be calculated from the main carrier frequencies, that is (4) For example, the center frequency of the distortion at IM3 bands can be obtained from (5) By using (5), the distortion at IM3 at the frequency can be expressed as (6) Removing the carrier frequency, the baseband information can be represented as (7) For better illustration, the distortion terms are listed below
From (8), we can see that the distortion is generated from combinations of the signals located at different bands. If we treat these signals at each band as independent inputs, constructing the distortion model is straightforward: simply generate each term by combining the baseband signals at different bands, as shown in Fig. 3 . The disadvantage of this approach is that the model complexity will increase quickly with the number of bands and the situation becomes worse when higher order nonlinearity and memory terms are included into the model.
III. PROPOSED MODEL
To overcome the disadvantage of the existing models, a novel model structure is proposed in this section.
A. Model Basis Decomposition
For better illustration, two special cases for IM3 in (5) are chosen. One is for the case in Fig. 2(b) where the IM3 is generated from LTE band 1 and 3 and the other is for the case in Fig. 2 (c) where the IM3 is generated from all three bands, that is (9) Then, (7) can be transformed to (10) , shown at the bottom of the page.
Looking at (10), although the term combinations change with the order of nonlinearities, there are some "common" unchanged terms in each equation. To illustrate this, we can re-write (10) as (11) , shown at the bottom of the page, where we can see that and appear in all the modeling terms, for each case, respectively. By separating the common parts from the changing parts in each term, the model can be divided into two sections as listed in Table I . Looking closely, we can find that the two parts of the modeling terms have distinct functionalities: the common part is related to the band of the distortion to be cancelled while the changing part depends on the envelope of the inputs only. We will explain this phenomenon in detail in next two sections.
Nevertheless, based on this finding, in this work, we propose to decompose the model into two basis functions: the first basis function is to locate the frequency components in the target bands, denoted as Basis 1; and the second basis is to create an accurate mapping from the input to the output by using envelope dependent nonlinear terms, denoted as Basis 2. The model structure can be described as (12) where and represent Basis 1 and 2, respectively.
B. Model Basis 1
As mentioned earlier, if the input signal passes through a nonlinear PA, the output will contain many distortion products that can spread over multiple frequency bands. To cancel transmitter leakage, we only concern the distortion that falls at the receiver frequency band, for example, the distortion at band in 3-C CA Case in Fig. 2(b) . Because this frequency band is different from where the original input signals are located, to generate this distortion, we must "inter-modulate" inputs between two bands and thus generate the new frequency band. The basic description for the dual-band scenario can be found in [9] . For instance, as illustrated in Fig. 4(a) , to generate the distortion at band, we can multiply two inputs from one band with the conjugate of the input from the other band (13) where is the conjugate operation. This term is corresponding to the carrier frequency change, i.e., . As shown in Fig. 3 , (13) only generates the 3rd-order distortion. For higher order distortion, we need to add in more terms, e.g., for the 5th order. To ensure the model output stay in the target band, these extra added terms should not "move" the band. In other words, they should only affect the frequency components within the target band, but not crossing the bands. For instance, is only used to weight in . In the frequency domain, is corresponding to the frequency , which indicates that multiplying this term does not change the carrier frequency. Therefore, from the frequency selection point of view, (13) is the key element that "selects" the target bands in the model construction. Follow the same logic, the band selection element for IM3 band in the 3-C CA case in Fig. 2(c) is (14) as illustrated in Fig. 4(b) .
In summary, Basis 1 of the above cases can be described as (15) Based on the same idea, the distortion located at other frequency bands can also be constructed by simply changing the combination of the signal terms.
C. Model Basis 2
To model high order nonlinearities, (15) can be multiplied with different high order terms as shown in Table I , which can be describe as Basis 2, that is (16) This polynomial extension is straightforward, but this operation will lead to considerable increase of the model complexity when strong nonlinear distortion is involved as discussed earlier.
As mentioned in [24] , [25] for the low-pass equivalent model, once the relationship between the input and output meets the requirement of odd parity and the mapping is located at the specified frequency band, it is not necessary to build the high-order nonlinearities using conventional polynomials. Instead of using each individual envelope, in this work, we propose to construct the second basis function using the average envelope of the signal, that is (17) This structure keeps the even-parity, which satisfies the oddparity rule of the low-pass equivalent model construction, when multiplied with the Basis 1 function that satisfies the odd-parity.
At the first glance, we may think the square root operations can be very complex in hardware implementation compared to the conventional polynomials. Surprisingly, with the assistance of coordinate rotation digital computer (CORDIC) technique [26] in FPGA, the complexity can be significantly reduced and becomes lower than that for the polynomials. We will discuss this in detail in Section IV
D. Model Basis Re-Combination
To simplify the model expression, we move the power operation out of the basis in (17) and re-define the terms inside the power operation as Basis 2. Thus the new model can be expressed as (18) where is the basis deciding which band to be compensated, and is the basis for generating the high-order nonlinearities. represents the nonlinear order. In the dual-band and tri-band cases, specific examples are given in Table II. The derivation above is only for the memoryless nonlinear systems. To further characterize a wider range of nonlinear systems, memory effects need to be taken into account. To do so, the model can be constructed as (19) where and is Basis 1 and Basis 2, respectively. again represents the nonlinear order and and represent the memory length for each basis, respectively. 
E. The Generalized Model
The model proposed above can be easily extended to general cases without structure changes. For instance, in the 3-C CA case, if the three carrier frequencies are evenly allocated, multiple intermodulation products may fall into the same frequency band, as shown in Fig. 5 , where the carrier frequency MHz (LTE band 3), MHz (LTE band 2), MHz (LTE band 1) are evenly spaced, which leads that the distortion located at 2245 MHz can be generated from two different IM3 products. One is generated from two carriers, 1985 MHz and 2115 MHz, and the other from all three carriers, that is (20) Because both distortion bands are located at the same frequency, the total distortion component should consist of two parts, which requires two different modeling terms. As discussed earlier, the frequency components can be easily selected with Basis 1 functions in the proposed model. In this case, we simply construct two Basis 1 functions to model the two IM3 products, i.e., To generalize this procedure, we reformat the model as (23) where is the basis for the th distortion band or term to be compensated, and is the basis for modeling high-order nonlinearities. represents the nonlinear order and represent the memory length for each basis, respectively. We call this model the generalized dual-basis envelope-dependent sideband (GDES) distortion model. The model structure is illustrated in Fig. 6 , where a frequency analysis block is added to select distortion components and bands before constructing the model.
Compared to the existing solutions, this new model structure provides many advantages. Firstly, the signal processing bandwidth is only related to the baseband signals at each band, leading to the narrow bandwidth requirement. Secondly, because only one average envelope is involved, the number of model coefficients is significantly reduced and thus low-complexity implementation can be realized. Thirdly, the target band can be arbitrarily changed by replacing the terms in Basis 1 without significantly changing the model structure, which brings great flexibilities for future extension.
IV. FPGA IMPLEMENTATIONS
To evaluate the practical application of the proposed cancellation structure, the proposed model is implemented in FPGA and compared with the existing models in terms of resource consumption. A generalized FPGA implementation architecture for TX leakage suppression in 3-C CA application is also proposed in this section. 
A. FPGA Resource Consumption Comparison
Two types of structures are employed to make a fair comparison. The first model is an existing model based on termspicking approach in the dual-band case as (24) The second model is a typical example of the proposed model for the same dual-band scenario as (25) The objective is to compare the resources consumption when the similar performance is achieved. Based on (24) and (25), two FPGA implementation architectures can be built as shown in Fig. 7 .
In Fig. 7(a) , the common part can be implemented by employing two complex multipliers. To implement the changing part, four square operations and two adders are required to calculate and . Multiplexing technique can be employed to reuse the hardware resource and thus reduce resource consumption. Different orders of nonlinear terms are then fed into multiplication and combination module to construct all the possible combinations for the two inputs, e.g.,
. The different outputs will then be multiplied with the common part, and fed into memory structure (equivalent to the FIR structure). Finally, all these terms can be added together to construct the full distortion model. Since there are many possible combinations, a large number of multipliers are usually involved in this implementation. In Fig. 7(b) , the proposed structure mainly consists of three parts: Basis 1 generation, Basis 2 generation and the combination of these two bases including different orders and memory. Firstly, in Basis 1 generation, is equivalent to the common part in Fig. 7(a) . Secondly, Basis 2 is generated in a different way from the conventional method. At the first glance, one may think more complex computation will be involved in the envelope calculation, since there is square root operation. However, by using CORDIC [26] , this step becomes very simple with only shift and addition operations involved, which significantly reduces the implementation complexity. The details for the square root implementation are given in Appendix. Two CORDIC modules are employed to generate and , which take the I signal as one input and Q signal as the other input to calculate the . To reduce the resource consumption, multiplexing technique can also be employed, which is also discussed in Appendix. After this operation, we can continue to employ a CORDIC module to realize the implementation of . Then, the implemented function can be used to generate high order terms, combined with the Basis 1, and delayed and multiplied with different coefficients. Finally, all these terms can be added together to construct the full distortion model. In practice, the memory structure can be further simplified based on the practical requirement. For example, if memory terms are few, all the different memory structure can be added first and then delayed together, which will reduce the number of multipliers required.
B. The Generalized Architecture for 3-C CA Application
One big advantage of the proposed structure is that the envelope term is in a generalized format which can be easily extended to various multi-band cases. For instance, to extend from dual-band to tri-band, only one more CORDIC needs to be employed in the implementation. The detail of this implementation is given in Appendix. Furthermore, because the model is in a generalized structure, all the distortions located at different IM3 bands can be generated by using the same hardware block. It also allows multiplexing to be employed to save resources in FPGA. For instance, as illustrated in Fig. 8 , the input may consist of three original inputs . To generate , we can simply select from branch 1 and select from branch 2 and 3. To generate and should be selected from different branches, respectively. The resource consumption for this module will be discussed in next section with a practical example.
Based on the discussion above, the general FPGA implementation architecture for tri-band intermodulation product cancellation is shown in Fig. 9 . In selection block, the input "mode" is used to select the cancellation band. For example, in the 3-C CA application, there are 9 options for the IM3 bands selection. With this operation, the distortion located at different bands can be cancelled by controlling the single variable of the mode.
V. EXPERIMENTAL RESULTS
To effectively validate the proposed method, a test bench was setup as shown in Fig. 10 . In the transmit chain, the baseband signals with different carrier aggregation allocations are generated in PC by software MATLAB, then up-converted to RF frequency, and fed into a high power LDMOS PA operated at 2.14 GHz with average output power of 37.5 dBm. Due to the limitation of the platform, we conduct the test without a real duplexer but using a digital filter instead. In other words, the transmitter distortion at the receiver band is not attenuated by a duplexer before down-conversion. In our test, the full transmitter signal is fed into the receiver, then down-converted, sampled and finally demodulated back to the baseband. The sideband distortion is obtained by applying a digital filter on the received signal. The distortion suppression model was implemented in FPGA and can run in real-time, but the model extraction was conducted in MATLAB by using the standard least squares (LS) algorithm. Furthermore, in these tests, the receiver chain was considered being linear and had a fixed gain. Due to the bandwidth limitation of the platform, we only used 5 MHz signals at each band to conduct the "proof-of-concept" tests.
A. 3-C CA Case 1: Fig. 2(b)
In this test, the baseband signal combines two 5 MHz signals located at MHz and MHz and with peak-to-average power ratio (PAPR) of 7.8 dB. The sideband distortion is located at MHz. The model configuration in (25) is set as . Fig. 11 shows the measured power spectrum density with and without the transmitter leakage suppression. From Fig. 11 , it can be clearly seen that 25 dB suppression can be achieved by employing the proposed model, which confirms the model accuracy. The signal processing bandwidth required is only 46 MHz that is corresponding up to 9th order nonlinearities with the two 5 MHz baseband signals, regardless of the frequency spacing. It is also worth mentioning that only 8 coefficients are required in this proposed model, which leads to a very low-complexity in practical implementations. The model was implemented in real hardware FPGA board. The measurement results from FPGA implementation are compared with the one from the simulation in Fig. 12 , where we can see that the hardware performance is almost as good as that simulated in MATLAB. For the conventional model, to obtain the similar performance, 12 coefficients are required, that is, the model configuration in (24) is set as . The performance is also illustrated in Fig. 11 .
The FPGA resource utilization for this case is listed in Table III to compare the resource consumption. The implementation of Basis 1,
, in both models are the same, which occupies 2068 slice LUTs and 2036 slice registers. Fig. 11 . Measured performance comparison for distortion suppression at IM3 band in 3-C CA case 1: Fig. 2(b) . Fig. 12 . Measured performance for proposed method in distortion suppression at IM3 band in 3-C CA case 1: Fig. 2(b) .
The differences are in the implementations of the step 2, 3 and 4, whose resource consumptions are listed in Table III in  details Basis 2 of in the proposed algorithm is accomplished by using CORDIC, which saves 57% LUTs and 38% registers in contrast with that in step 2 in the conventional model. As mentioned earlier, the conventional algorithm requires 4 more coefficients than the proposed one based on the similar calibration performance, resulting in great amount hardware occupation in step 3 and 4 to implement coefficients multiplication. Since the memory structures of two models are identical, the resource usages for both approaches are the same in step 5. In summary, compared to those in the conventional model, the numbers of slice LUT and slice register used in the proposed model decrease by 3426 and 3309, respectively.
As discussed in Section IV, multiplexing technique can be employed to further reduce the FPGA resource consumption. The simplified cases are illustrated in Table IV . In step 2, the generation block of and by CORDIC can be multiplexed, which is the same case as to obtain and by adders and multipliers in the conventional model In the final step 5, the summation with different memory consists of current terms and delayed terms can share one structure. Moreover, the resource consumption of step 4 in Table IV is dramat-TABLE III  FPGA RESOURCE UTILIZATION COMPARISONS FOR THE CASE IN FIG. 2(B) ically reduced in the low-cost implementation compared with that in Table III . This is because the specific mode of the multiplications between coefficients and input terms in step 4 is employed. When the multiplier model is set as constant coefficient model, the consumption will be calculated depended on fixed coefficient, which is normally less than common (parallel) multiplier mode. Therefore, both implementations for step 4 in Table IV employ fixed coefficient strategy to further save hardware dissipation. The difference of resource utilization between the proposed and the conventional method in the low-cost multiplexing implementation is smaller than the previous structure without simplification in Table III , but the proposed model still shows advantages over the conventional model. Furthermore, comparisons with other models published in the literature are also listed in Table V , in terms of suppression performance and hardware resource usage. From the results, we can see that our model can achieve great suppression performance with relatively low hardware resources.
B. 3-C CA Case 2: Fig. 2(c)
In this test, the baseband signal combines three 5 MHz signals located at MHz, MHz, MHz to form a 3-C CA signal. Although the scenario is changed compared to Part A, the model configuration is still set as with 8 coefficients. Fig. 13 shows the measured power spectrum density with and without the transmitter leakage suppression. In Fig. 13 , two typical examples of IM3 distortion bands in 3-C CA are expected to be compensated: (1) the IM3 distortion generated from all three carriers, e.g., the target frequency is located at , as shown in Fig. 13(a) ; (2) the IM3 distortion generated from any two carriers, e.g., the target frequency is located at , as Fig. 13(b) . From Fig. 13 , it can be clearly seen that 2 dB suppression can be achieved for both cases by employing the proposed model. Also the measurement results from FPGA implementations perform as good as the ones from MATLAB. The FPGA resource utilizations for both cases are listed in Table VI . Compared to the resource utilization in Table IV , the consumptions in both cases listed in Table VI only increase slightly Also it can be easily seen that there is slight difference in FPGA resource utilization for both cases. The reason is that due to the different values of the coefficients in these two cases, the FPGA implementation will lead to slight different hardware occupations. Based on the results, it can be seen that the proposed methods will save more FPGA resources when more carriers involved.
It is also worth mentioning that the resource consumptions for the selection module in Fig. 8 have also been investigated in this section. The FPGA resource utilization comparison is listed in Table VII Compared to the case without Mux implementation  in Table VI , the one with Mux implementation only increases 96 slice LUTs which is insignificant. However, this module will largely enhance the flexibility to form a uniform structure to cancel any sideband distortion located in IM3 bands. Fig. 5 In this test, the baseband signal combines three 5 MHz signals located evenly at MHz, MHz, MHz. One of the sideband distortions is located at 2200 MHz, which can be generated from and , that is and also from all three carriers, that is,
. Both distortion components will be overlapped with each other. The model configuration in (22)- (23) is set as , in which two memory parameters are simplified to one. Compared to the cases in Part A and B, the number of the coefficients will be doubled, that is, 16 coefficients, since there are two different basis 1 functions in the model. Fig. 14 shows the measured power spectrum density with and without the transmitter leakage suppression. From  Fig. 14 , it can be clearly seen that again 20 dB suppression can be also achieved by employing the proposed model.
VI. CONCLUSION
In this paper, a generalized dual-basis envelope-dependent sideband distortion model, which is further developed from the basic concept in [21] , was proposed to model and suppress transmit leakage for non-contiguous CA applications. The proposed model structure provides great flexibility for dealing with different intermodulation products in a uniform structure, which has been validated by FPGA implementation. Experimental results demonstrated excellent model performance with very low model complexity, which provides a promising application in future carrier aggregation applications.
APPENDIX FPGA IMPLEMENTATION OF SQUARE ROOT OPERATION
CORDIC is a technique that calculates the trigonometric functions of sine, cosine, magnitude and phase to a desired precision via iteratively rotating the phase of the complex number by multiplying it with a succession of constant values. In this Appendix, FPGA implementation for the square root operation of complex numbers employing CORDIC is provided.
To find the magnitude of a complex number, , we can simply rotate it to have a phase of zero and then the magnitude of this complex number is just the real part since the imaginary part is zero. To do this in digital circuits using CORDIC, we first need to make sure its phase is less than degrees. This can be achieved by rotating the complex number by 90 degrees first if its phase is greater than 90 degrees: at the first step, we need to determine if the complex number has a positive or negative phase by looking at the sign of the value. If the phase is positive, rotate it by degrees otherwise by degrees. To rotate by degrees, swap and , and change the sign of , i.e., ; to rotate by degrees, swap and , and change the sign of , i.e.,
. The phase of is now less than degrees, and we then further rotate the phase iteratively using CORDIC.
Since the phase of a complex number is , the phase of " " is and likewise, the phase of " " is . To add phases, we can multiply by " " while to subtract phases, we can use " ". In the following iterations, we rotate the phase of the complex number using numbers of the form of " ", where is decreasing with powers of two after each iteration, starting with and thereafter , etc., until the phase goes to zero. The operations can be expressed as (A. 1) where and is the real and imaginary part of the complex number, respectively, and represents th rotation. can have the value of or 1, which is used to determine the direction of the rotation depends on the sign of .
represents the gain of each rotation, that is (A.2)
To simplify the operation, the gain can be compensated together by using a scaling factor in the end of iterations, that is (A.3)
Since the multiplies are powers of two, CORDIC can be implemented in binary arithmetic logic using just shifts and adds without using actual multipliers [26] . For instance, at each iteration, the real part is obtained via which is only involving shifting to right and adding with .
A. Implementation of

Since (A.4)
One CORDIC module can be directly employed and reused, that is (A.5) Then employing the CORDIC to calculate again, we can obtain (A. 6) To reduce complexity, the scaling factor can be moved out of CORDIC and compensated later, that is (A.7)
where we can see that only two CORDICs are involved to conduct the square root operation. Because CORDIC module only uses adders and shifters, the FPGA resource consumed in the proposed approach is much less than that in the conventional polynomial implementation.
Let's compare the resource consumption of with that of . At first glance, one may think the implementation of should be more complex than that of , since there is one extra square root operation. However, after careful investigation, the actual resource consumptions are totally different, as shown in Table VIII. The  implementation of will require four complex multipliers and three adders. Due to the multipliers, the resource consumption will be costly, which will require 1172 slice LUTs and 1211 slice registers in FPGA. Even if the multiplexing technology is employed, e.g., and may share the same resources, the total resource consumption is still very high. On contrary, the implementation of will only require three CORDIC module, which only employs 495 slice LUTs and 726 slice registers, leading to 40% saving of the resource consumption. Also, if multiplexing is employed, and can share the same CORDIC module and thus only two CORDIC modules will be required, which will further reduce the resource consumption. In summary, because the multipliers will consume more resources than CORDIC, the total FPGA resources consumed for the implementation of is actually large than the one of .
B. Implementation of
Based on the implementation (A.7) of , one more input is added into CORDIC module. Firstly (A.8) then (A.9) However, in this operation, the scaling factor for is , while the one for is . Therefore, both operations require multipliers. In order to reduce the number of multipliers, a new method is proposed below, that is (A.10) Finally, we can obtain (A.11)
Although one more CORDIC is employed, it can be also multiplexed, which significantly reduces the total implementation cost.
