Computationally efficient real-time digital predistortion architectures for envelope tracking power amplifiers pere l. gilabert and gabriel montoro This paper presents and discusses two possible real-time digital predistortion (DPD) architectures suitable for envelope tracking (ET) power amplifiers (PAs) oriented at a final computationally efficient implementation in a field programmable gate array (FPGA) device. In ET systems, by using a shaping function is possible to modulate the supply voltage according to different criteria. One possibility is to use slower versions of the original RF signal's envelope in order to relax the slew-rate (SR) and bandwidth (BW) requirements of the envelope amplifier (EA) or drain modulator. The nonlinear distortion that arises when performing ET with a supply voltage signal that follows both the original and the slow envelope will be presented, as well as the DPD function capable of compensating for these unwanted effects. Finally, two different approaches for efficiently implementing the DPD functions, a polynomial-based and a look-up table-based, will be discussed.
I . I N T R O D U C T I O N
Alternatives to the classical Cartesian transmitter that uses linear power amplifiers (PAs) with constant supply are being investigated to overcome the poor power efficiency with high peak-to-average power ratio (PAPR) signals. The Doherty architecture, for example, has been adopted for base stations, where several manufacturers (e.g. Freescale, NXP), are offering PAs with an average efficiency up to 50% and even more [1] . However, other promising structures such as the envelope elimination and restoration (EE&R) [2, 3] , the envelope tracking (ET), or polar transmitters with delta-sigma modulation [4] are still being considered as candidates to overcome the Doherty PA efficiency. From the implementation point of view, ET is a very attractive technique because it can be applied in conventional transmitters based on linear RF amplification topologies by simply substituting the classical static supply for a dynamic one.
One of the main constraints in the maximum efficiency that can be achieved by ET transmitters regards the envelope modulator of the envelope amplifier (EA), since the overall efficiency of an ET architecture is the product between both the PA and the EA power efficiency. The envelope bandwidth (BW) is several times (theoretically is infinite) the BW of the baseband complex modulated signal, which is critical when considering current wideband signals with high PAPR. There are already some companies, such as Nujira (www. nujira.com), MaXentric (www.maxentric.com) or Quantance (www.quantance.com) that are offering ET solutions with average efficiencies above 60% for WCDMA and LTE signals.
One of the main challenges of the EA consists of supplying the power required by the transistor at the same speed of the signal's envelope. In dual-band applications, for example, this becomes even more challenging since the combined envelope can present BWs more than 5 × the carrier separation. Therefore, in order to relax the EA requirements, some solutions have been proposed to reduce the BW and slew-rate (SR) of the original signal's envelope [5] [6] [7] [8] . Unfortunately, the use of a slower version of the envelope to supply the PA drain not only degrades the overall efficiency but also results in nonlinear distortion amplification. Despite the efficiency and linearity degradation, the solution of supplying the PA with a slower envelope can still be of interest in applications where it is necessary to trade-off the BW and efficiency due to the EA limitations. To compensate the nonlinear distortion that arises when using the SR's limited version of the original envelope, it will be necessary to use a slow envelopedependent digital predistorter (SED-DPD) [5, 9, 10] .
Therefore, this paper is organized as follows. The BW versus efficiency trade-off in EAs will be discussed in Section II. The design of the DPD that is required to compensate for the nonlinear distortion that arises when supplying with a slower version of the signal's envelope, will be presented in Section III. Some field programmable gate array (FPGA)-oriented implementation architectures for real-time DPD will be discussed in Section IV. Finally, in Section V conclusions will be given.
In an ET system (see Fig. 1 ), the supply voltage is dynamically adjusted to track the RF envelope at high instantaneous power. The supply voltage can be shaped according to different criteria. By means of a so called shaping function it is possible to accommodate the shape of the supply voltage (that somehow must follow the instantaneous RF envelope) to achieve the following objectives: optimum efficiency, isogain [11] [12] [13] or SR and BW reduced shaping [14] .
Focusing on this later objective, two different approaches based on SR and BW reduction of the RF signal's envelope showed that these strategies are suitable to adapt the envelope characteristics to the EA requirements or limitations at the expenses of having efficiency degradation. On the one hand, the method proposed in [5, 6] limits the BW of the envelope iteratively, which may represent an issue in real time applications. On the other hand, the method proposed in [8] consists of a real-time algorithm where the resulting signal is limited in SR but not in BW, making challenging its amplification if only a switched mode EA is considered or requiring a wide band if only a linear EA is considered. Therefore, in [14] , the SR reduction algorithm proposed in [8] was modified in order to also restrict the BW of the resulting slow envelope. Moreover, due to its simplicity this algorithm is suitable to be implemented in a digital signal processor. Fig. 2 shows the original RF signal's envelope, an SR reduced version of the original envelope (SR reduced envelope -SRRE) and a BW reduced version of the original envelope (BW reduced envelope -BWRE) in both time and frequency domains, respectively. The parameter N (defined in [8] ) is related to the maximum allowed increment in the signal's slope. For example, N ¼ 100 corresponds to an SR reduction of 96% and BW reduction of 64% with respect to the original signal's envelope. The results shown in Fig. 2 were extracted from the implementation of this algorithm on a FPGA Virtex-4 whose clock speed was set to 60 MHz.
As reported in [15] , the efficiency decays more or less linearly with the BW reduction, while it presents a logarithmic behavior with the SR reduction. As a consequence, when considering applications with high BW signals (e.g. dual-band transmissions) it is possible to find a trade-off solution to meet both SR and BW requirements of the EA while still keeping a reasonably good drain efficiency figure.
Unfortunately, using the SR and BW limited envelope (or simply slow envelope -E s ) to supply the power transistor's drain generates a particular nonlinear distortion. Fig. 3 shows the AM-AM characteristics considering different margins of E s values. As observed in Fig. 3 , the ET PA shows a nonlinear variant gain because the slow envelopes used to supply the PA and the RF input signal are not univocally related. Therefore, for a given input it is possible to have a range of different outputs because it depends on the specific value of the dynamic power supply. Therefore, the ET PA presents a SED nonlinear behavior.
I I I . D E S I G N O F A R E A L -T I M E D P D F O R E T
The type of low-pass equivalent black-box behavioral model required to characterize the nonlinear distortion that arises when applying ET is dependent on the strategy (or shaping function) followed to supply the PA. Therefore, on the one hand, if the PA drain voltage follows the same shape (despite being bounded at low-voltage levels) than the RF signal's envelope, typical behavioral models such as the memory polynomial (MP) [7] can be used for DPD purposes. On the other hand, if the slow envelope is used to supply the PA, then the DPD has to include the information of the slow envelope in order to be capable of compensating for this type of nonlinear distortion.
For the case of using the original envelope, we can consider the implementation of a DPD based on the simple MP model. Following the notation in Fig. 1 , the input-output relationship of the MP DPD is defined as
where nonlinear functions f i ( . ) can be described by polynomials of order P
As previously explained, when considering the slow envelope to supply the PA, the nonlinear distortion that appears cannot be compensated by simply using dynamic behavioral models such as the MP [10] . Therefore, in [9] a dynamic SED behavioral model is proposed to compensate for this type of nonlinear distortion. The input-output relationship of the SED-DPD is defined as
where E s [n] is the SR-limited version of the original envelope, u[n] is the input signal, t j and t i (with t 0 ¼ 0) are the most significant tap delays of the slow envelope and input signal, respectively, contributing to the characterization of memory effects. Figure 4 shows linearized and unlinearized AM-AM characteristics of an ET PA when supplying the PA with the original envelope (MP DPD used) and with a slower version of the original envelope (SED-DPD used). The linearity performance in terms of out-of-band distortion compensation of the SED-DPD can be observed in Fig. 5 . These particular results were measured on a test-bed based on instrumentation, schematically depicted in Fig. 1 and described in [10] . The Device under test (DUT) is a Cree Inc. Evaluation Board CGH40006P-TB (GaN transistor) at 2 GHz operating at a mean output power of 28 dBm. For the sake of simplicity, a linear IC LT1210 was considered as the envelope driver. The PAPR of the signals at baseband range from around 8 up to 11 dB, depending on the type of signal used (single-carrier M-QAM or OFDM). In the case of the SED-DPD, we used the following configuration: 
190
pere l. gilabert and gabriel montoro
I V . F P G A I M P L E M E N T A T I O N A R C H I T E C T U R E S
The FPGA implementation of an MP DPD will follow the structure presented in Fig. 6 . Each branch represents one nonlinear function expressed by means of a polynomial development. To allow an accurate and efficient FPGA implementation of the MP DPD it is important to minimize the number of arithmetic operations (counting both additions and multiplications) and minimize the accumulative error inside the FPGA. Both issues can be addressed using the Horner's rule and this way limiting the number of consecutive complex multiplications to a maximum of two. Moreover, as presented in [16] , in order to avoid a large variation in magnitude of the polynomial coefficients (which requires a large number of bits to preserve the precision of the computation) it is possible to take the ratios of adjacent coefficients. As a consequence, with a reformulation of (2) according to Horner's rule, nonlinear functions f i ( . ) can be described as
Therefore, taking into account the polynomial expression in (2), where g pi [ C, it takes p + 1 real multiplications for each monomial g pi u[n − t i ] p and 2P additions (P complex additions), resulting in P(P+7)/2 arithmetic operations for a polynomial of degree P. While using the formulation in (4), computation starts with the innermost parentheses using the coefficients of the highest degree monomials and works outward, each time multiplying the previous result by u[n − t i ] | |and adding the coefficient of the monomial of the next lower degree. Now it takes 4P arithmetic operations for a polynomial degree of P, which for high polynomial orders, Horner's algorithm results much more computationally efficient. Figure 7 shows the structure of the nonlinear branches of the MP DPD in Fig. 6 . Alternatively, instead of using polynomials to describe nonlinear functions f i ( . ) it would have been possible to use basic predistortion cells (BPCs) [17] . A BPC is composed of a RAM block acting as a look-up table (LUT), an address calculator and complex multipliers.
In order to implement the dynamic SED-DPD in an FPGA device, the polynomial model in (3) is expressed as a combination of several BPCs [9] :
Fig . 7 . Structure of one of the branches of the MP DPD (see Fig. 6 ) using Horner's rule.
which yields to the following expression of the SED-DPD:
with G LUT iqj being complex LUT gains. Figure 6 shows the general block diagram of the SED-DPD architecture, where nonlinear functions f iqj ( . ) can be expressed as a combination of BPCs. The number of BPCs forming this SED-DPD is # BPCs ¼ (Q + 1)(N + 1)(M + 1). This structure requires less arithmetic operations than using polynomials; however, it consumes more memory resources. Figure 8 shows the basic structure of a BPC where a dualport RAM, with two independent sets of ports for simultaneous reading and writing, is used to allow the complex LUT gains to be updated continuously without interrupting the normal data transmission. Therefore, because of this LUT-based architecture, it is possible to perform continuous adaptation of the DPD function by means of the least-mean squares (LMS) algorithm [17] .
V . C O N C L U S I O N
In this paper, we have presented and discussed two computationally efficient design strategies for implementing real-time DPD in a FPGA device when considering ET PAs. As discussed along the paper, when considering slow versions of the original envelope to perform ET, the nonlinear distortion that appears has to be compensated using DPD architectures that depend not only on the input data and its memory, but also on the drain voltage signal (slow envelope) and its memory. Two efficient architectures to allow real-time FPGA implementation of the DPD function have been presented. One solution is based on polynomials and the other one is based on LUTs. The trade-off between those two configurations is the number of arithmetic operations versus the memory resources requirements. In any case, the linearization performance of both architectures has been validated in several papers [9, 16] . Finally, another key issue toward the computationally efficient FPGA implementation is the design of identification/adaptation process. One possibility is the use of LMS-based solutions as in [17] , where the Fig. 8 . Basic architecture of a BPC forming the SED-DPD (see Fig. 6 ).
coefficients (or complex LUT gains) are being continuously updated. Alternatively, if more complex least-squares-type algorithms are considered, the coefficient update procedure can be relocated to embedded software running on a microblaze soft processor core as in [18] .
A C K N O W L E D G E M E N T
This work was supported by the Spanish Government (MINECO) under project TEC2011-29126-C03-02.
