Abstract-This paper presents a hardware implementation of a digital predistorter (DPD) for linearizing RF power amplifiers (PAs) for wideband applications. The proposed predistortion linearizer is based on a nonlinear auto-regressive moving average (NARMA) structure, which can be derived from the NARMA PA behavioral model and then mapped into a set of scalable lookup tables (LUTs). The linearizer takes advantage of its recursive nature to relax the LUT count needed to compensate memory effects in PAs. Experimental support is provided by the implementation of the proposed NARMA DPD in a field-programmable gate-array device to linearize a 170-W peak power PA, validating the recursive DPD NARMA structure for W-CDMA signals and flexible transmission bandwidth scenarios. To the best of the authors' knowledge, it is the first time that a recursive structure is experimentally validated for DPD purposes. In addition to the results on PA efficiency and linearity, this paper addresses many practical implementation issues related to the use of FPGA in DPD applications, giving an original insight on actual prototyping scenarios. Finally, this study discusses the possibility of further enhancing the overall efficiency by degrading the PA operation mode, provided that DPD may be unavoidable due to the impact of memory effects.
I. INTRODUCTION

C
URRENT studies regarding the needs of wireless communications equipment agree in highlighting the importance of reducing power consumption to cut running costs as an added value. Besides, linearity requirements are specified in communication standards and, thus, reducing unacceptable distortions is mandatory [1] , [2] . Nevertheless, new standards enhancing high data rates by means of spectrally efficient complex modu- lation schemes require power amplifiers (PAs) handling signals that present high peak-to-average power ratios (PAPRs). Those spectrally efficient modulation formats are unfortunately very sensitive to the intermodulation distortion (IMD) that results from nonlinearities in the RF transmitter chain, mainly due to the PA nonlinear behavior. This implies that for having linear amplification, significant backoff (BO) levels of operation are required, thus penalizing power efficiency in the PA. For example, in the cellular telephony context, PAs have to support some of the code division multiple access (CDMA) family [CDMA2000, evolution data optimized (EVDO), W-CDMA, long-term evolution (LTE)] of wireless standards exhibiting typical PAPR figures around 10 dB. In a broadband access context, communications standards such as IEEE 802.11a, DVB-T, or the IEEE 802.16 consider the use of orthogonal frequency division multiplexing (OFDM) signals presenting even higher PAPRs (up to 14 dB) and bandwidths up to 20 MHz or wider. Furthermore, in base stations, the PA has to handle a composite RF signal resulting from the sum of several independent modulated carriers. The wider bandwidths and increased PAPR figures aggravate the linearity versus efficiency problem. A recognized solution to avoid the power inefficient BO operation is the use of PA linearizers.
Among linearizers, digital predistortion (DPD) is on its way of becoming one of the most important linearization techniques due to the availability of faster digital signal processing (DSP) hardware, replacing feedforward as the mainstream technique in commercially available base-station products. Manufacturers of chipsets (PMC-Sierra, Xilinx Inc., Altera), PA rack systems (Andrew, Powerwave), and base stations (Lucent-Alcatel, Ericsson) propose different types of DPD solutions.
Besides the efficiency problem, coping with high-speed envelope signals makes designers reconsider the degradation suffered from PA memory effects since their impact is more relevant as the signal bandwidth increases. Actually, due to PA dynamics, the amplified signal not only depends on the input signal at the same time instant, but also on the history of the input-output signals as well. Therefore, PA memory effects have to also be taken into account when designing linearizers.
DPD has been the subject of multiple publications in that memory compensation area [3] - [10] , demonstrating the effectiveness of a variety of approaches to counteract both memory effects and nonlinear behavior of the RF PA. However, little attention has been drawn to practical implementations of such systems [11] - [17] . This study aims to contribute to that field by 0018-9480/$25.00 © 2008 IEEE focusing on topics uncovered in previously published demonstrators based on laboratory setups with vector signal generators/analyzers in its core and delayed offline data processing. There, some questions regarding DPD application prototyping have remained unexposed, such as follows:
• implementation: suitable real-time architectures, practical implementations, and DPD complexity dependence versus the memory effects time span; • efficiency: power consumption of the DPD itself and its impact on the transmitter efficiency; • DPD adaptation: memory effects dependence on the specific signal, and DPD ability to maintain linearity performances through signal changes in multicarrier and variable bandwidth systems. This paper addresses those issues through experimentation with a field-programmable gate-array (FPGA)-based DPD. For the first time, to the best of authors' knowledge, a DPD based on a nonlinear auto-regressive moving average (NARMA) architecture [18] is experimentally validated. The two distinctive characteristics of this NARMA-based DPD are its straightforward deduction from the NARMA PA model, and its nonlinear recursive structure aimed at relaxing the number of coefficients required to reproduce PA dynamics. Moreover, this study investigates the possibility to enhance the overall efficiency by degrading the PA operation mode, assuming that the DPD is unavoidable due to the unwanted impact of memory effects. Experimental results on PA and DPD power consumption and linearity enhancement will be presented.
Therefore, this paper is organized as follows. Section II introduces DPD linearization issues related to PA memory effects and its most remarkable influences. In Section III, we propose a multiple lookup table (LUT) architecture that is based in the NARMA DPD described in [18] . This multi-LUT architecture can be mapped in an FPGA device. Practical issues regarding this LUT-based architecture, such as LUT value filling, access, and addressing, are discussed in this section. Section IV describes the experimental setup and procedures deployed to validate the proposed NARMA DPD. An insight on the PA model estimation by means of the least squares (LS) algorithm to adapt the DPD function is provided as well. In Section V, experimental results of the proposed NARMA DPD are provided. Furthermore, this section also discusses underlying practical topics such as the DPD power consumption, adaptation stability, and reliability. Section VI extends the contents of the preceding section by focusing on the impact on system performances of degrading the PA operating point, focusing on the overall efficiency. The implications of using a DPD to further counteract this more efficient, but less linear, degraded PA behavior are, therefore, investigated. Finally, in Section VII, conclusions are given.
II. PROBLEM STATEMENT
The DPD sensitivity to memory effects becomes a problem when trying to cancel distortion in wideband signals because it reduces linearization performance [19] . The most common sources of memory effects recognized in the literature are due to electrical and thermal dispersion effects. In addition, other authors also take into account trapping effects and impact ionization as potential sources [20] .
Traditionally, memory effects have been observed in the frequency domain as an asymmetry between the IMD products using a low PAPR two-tone test. However, in a realistic scenario that considers signals presenting a high PAPR, the impact of IMD products and hard nonlinearities decreases. Since the peak probability is low, the PA operates in its linear region most of the time. As a consequence, its dynamics mainly manifest themselves as unwanted in-band distortion. In such a case, memory effects can be better observed in the time domain (e.g., in-phase and quadrature (I/Q) wave signals, constellation trajectories, decision points at demodulation) than in the frequency domain. As it turns out, in-band distortion cannot be equalized by memoryless DPD unless some kind of filtering is considered together with the nonlinear compensation.
In order to cancel or minimize memory effects, the envelope filtering technique [19] is considered in this paper over other techniques such as impedance optimization [21] and envelope injection [22] . The principle of the envelope filtering technique consists of reproducing (in the predistorter) the inverse of the memory effects that are generated inside the PA, by means of filtering and phase shifting the envelope signal at baseband. Besides, the predistorter has to compensate the PA nonlinear behavior as well. This technique can be readily transposed into a DPD system in order to deal with current challenges regarding PA nonlinearities and memory effects compensation within the transmitter chain.
To do so, it is first necessary to identify a behavioral model capable of reproducing PA nonlinear dynamics. Later, from this behavioral model, it is possible to derive a suitable predistorter that takes into account the envelope filtering technique. This model has to satisfy three main constraints to be considered for DPD purposes. First, it has to be accurate in terms of memory effects reproduction. Second, it has to render a DPD implementation without an excessive computational cost. Finally, it has to be easy to extract and be invertible to facilitate the deduction of the predistortion function.
As a general introduction to the linearization architecture presented in this paper, Fig. 1 shows a general block diagram of a digital baseband predistorter system. DPD is performed in an FPGA. The offline adaptation process consists of periodically updating the suitable predistortion function from which the LUT contents are deduced. The adaptation is carried out by a PC. Alternatively, it is possible to use a DSP board. As long as the PA characteristic drifts are slow, the LUT update frequency is relaxed, and so is the hardware and computing constraints related to the adaptation procedures.
III. NARMA-BASED PREDICTIVE PREDISTORTER
A. Description of the DPD Function
By considering memory effects as secondary effects (despite their importance regarding linear distortion) with respect to a memoryless nonlinear behavior, it is possible to consider that the individual signal pulses propagate nonlinearly in time, but tend to sum up linearly [23] . For that reason, we have considered a NARMA model [24] to reproduce and later counteract shortterm PA memory effects.
The advantage of using a NARMA model is the introduction of a nonlinear feedback path (infinite impulse response (IIR) terms) that may permit relaxing the number of delayed samples considered to model the PA, in comparison to a model using only finite impulse response (FIR) terms. However, one of the main weaknesses of the NARMA model is its stability since the use of nonlinear feedback paths can result in overall system instability. Therefore, in order to guarantee the stability of a NARMA model, a stability test based in small gain theory is presented in [24] . To determine the stability of nonlinear systems, it is necessary to ensure that recursive nonlinear functions are bounded by some kind of norm. Further details on the small gain theory for nonlinear systems can be found in [25] .
The predictive DPD based in a NARMA structure is described in [18] , where DPD is carried out at baseband by adaptively forcing the PA to behave as a linear device. The predistortion function can be stated in terms of basic predistortion cells (BPCs). A BPC requires simple hardware blocks: a complex multiplier, a dual-port RAM memory block acting as the LUT, an address calculator, and two control ports: write enable (WE) and chip select (CS), as is shown in Fig. 2 . Therefore, the BPCs are the fundamental building blocks of the DPD, as is shown in Fig. 3 . The predistortion function stated in terms of combinations of BPCs can be expressed as (1) where is defined as (2) where (for both and ) are complex gains stored in their corresponding LUT, is the output of the DPD, and is the desired output defined as the signal to be transmitted multiplied by a linear amplification
In addition, and are the most significant sparse delays of the DPD input and output, respectively, that contribute at the description of the PA memory effects. The identification of these optimal delays and the definition of the minimum necessary memory length to model PA memory effects are discussed in [26] . More recently, heuristic search algorithms such as the simulated annealing or genetic algorithms have also been considered for these purposes. The use of the simulated annealing heuristic search algorithm has shown significant advantages (memory length reduction and better reliability) in comparison to the use of simple consecutive delays to model PA dynamics [27] .
B. LUT Spacing
How to organize the LUT spacing has been an interesting topic of discussion for several years [28] - [30] since a uniform or nonuniform spacing of the LUT is closely related to the linearization performance achieved by DPD linearizers. The so-called companding function is responsible for deriving the spacing of the input levels in the LUT. It performs a processing of input data for pointing the LUT in different resolution ranges. The most common companding functions reported in literature are amplitude, power, -law, Cavers optimum companding function, and a more simplified sub-optimum companding function presented in [30] .
The best linearity performance recognized in the literature is achieved with Cavers optimum companding function [29] . However, its computational complexity and its dependence on signal's probability density function make it less suitable in our generic approach. Amplitude spacing, also referred as uniform spacing, provides good enough results in comparison to the optimal companding function with reduced complexity.
Nevertheless, the square root operation is still necessary to compute the address when using the amplitude companding function. This operation can take several clock cycles to execute in an FPGA, adding undesired latencies. This may not be of major concern in nonrecursive DPD structures because sub-block latencies can be compensated in the parallel-related data paths by explicit delays, and they translate directly as a system input-to-output delay. However, in the proposed recursive DPD implementation, address computation latencies act as a bottleneck, limiting the minimum delay value of the recursive part of our NARMA-based DPD. That is, the minimum value of in LUT IIR 1 (see Fig. 3 ) is conditioned by latencies in previous stages of the DPD. Referring to Fig. 3 , the latency in sampling periods between and , adds to the latency to compute the address of in the recursive data path, thus imposing a minimum value . For that reason, the addressing in the proposed implementation is simply performed based on the power of the input complex signal. To properly fill the LUTs in that power addressing case, a new set of coefficients have to be obtained from . Supposing that each LUT has entries, the th entry for the corresponding LUT is obtained as (4) with .
IV. EXPERIMENTAL SETUP
A. Baseband Setup
The considered transmission bandwidths make the use of a single DSP device for the implementation of the DPD/adaptation procedures difficult. In practice, it is more suitable to consider a mixed DSP/FPGA architecture. In [16] , to allow a high data throughput, the FPGA is in charge of the real-time DPD processing at the actual sample rate, whereas the DSP performs more complex (algorithmic) and less time-constrained functions such as the adaptation process for DPD parameter update.
To enhance flexibility during the prototyping procedures, the DSP device has been replaced by a host PC in which MATLAB is in charge of the adaptation, as it is schematically depicted in Fig. 1 . The FPGA is a Xilinx Virtex-IV XC4VSX35 with the developed DPD core in charge of predistortion running at 105 MHz. An overview on the Virtex-IV family specifications can be found in [31] .
The linearization process is open loop controlled and works separately from the adaptation process. A feedback loop from the PA output towards the FPGA (through the demodulator and A/D converters) is also included to capture the necessary data to enable the adaptation process. In the proposed implementation, the FPGA provides the external host with buffers of predistorted and amplified output data of 2048 I/Q samples each. The host PC identifies the NARMA model and, by means of the predictive DPD function, the complex gains are computed and fed into the FPGA in the BPC convenient LUT form. The digital-to-analog (D/A) and analog-to-digital (A/D) converters handle 14-bit data, at 105 Ms/s as well, covering a bandwidth of 52.5 MHz at baseband. Maximum allowed signal bandwidth for third-order intermodulation distortion (IMD3) coverage is thus 35 MHz, whereas for full fifth-order intermodulation distortion (IMD5), coverage is 11.6 MHz.
B. RF Setup
Several tests have been performed indistinctly with different types of modulated signals presenting different bandwidths. The objective consisted of verifying, on the one hand, the dependence of the DPD function on the specific signal and, on the other hand, its reliability in front of possible changes of the RF input signal. Typically, signals used in this experiment have been in the range 5-20 MHz of bandwidth and 5-10 dB of PAPR, aiming to emulate the statistical properties of different representative scenarios (e.g., one-and two-carrier W-CDMA, single-carrier DVB-T, and WiMAX). In all cases, random filtered baseband data is generated in the host PC and transferred into the FPGA where the real-time DPD function takes place before D/A conversion, up-conversion, and amplification.
The RF chain under study in this work uses as final stage a 170-W peak-power PA based on the Freescale MRF7S21170H MOSFET transistor. A medium-power PA based on the MRF21010 transistor (10-W peak power), acting as a linear driver, precedes the main output amplifier.
Before the insertion of the PAs in the transmitter chain, a prior set of measurements and a calibration procedure to eliminate dc offsets originated by the I/Q demodulator is performed. By ensuring that no significant degradation is added by components in the feedback path, the imperfections in the forward path, up to the PA output, can be tackled by the DPD. The entire experimental setup, including the baseband processing part and the RF chain, is depicted in Fig. 4 .
C. PA Model Extraction/DPD Adaptation Procedure
The extraction, in the host PC, of the NARMA PA model is necessary to perform the update of the LUTs defining the dynamic predistortion function [18] . Nonlinear functions and in (2) are expressed here by polynomials. Their identification is performed using the LS algorithm. The LS takes advantage of the use of complex data buffers of 2048 samples. Other algorithms, such as the least mean square (LMS), recursive least square (RLS), or fast Kalman [32] , [33] are more oriented at minimizing the identification error sample-per-sample or considering a forgetting factor. Considering , the data vector at the DPD output (PA input), and , the corresponding time-aligned data vector of the PA output (and normalized by the linear PA gain to allow signals comparison), both vectors of samples length, we define
The input-output relation of an NARMA PA behavioral model can then be expressed in a matrix notation as (7) where and . The LS solution for (7) is (8) where superindex denotes complex conjugate transpose.
Once we have estimated the complex coefficients defining and nonlinear functions of the NARMA model (see [24] for further details) analogously, it is possible to extract the vector of complex coefficients defining .
D. PAPR Problem: Adaptation Policy
In Section IV-C, we have formalized the LS procedure to derive, from the PA input-output data samples, the polynomial functions that model the PA. These polynomial functions are later directly mapped into the BPC-LUTs to achieve the suitable DPD operation [18] . However, as a consequence of the high PAPR of current signals, the peak probability is low and it is difficult to get knowledge of the PA characteristic at high amplitudes. For instance, if the data used to extract the polynomial coefficients does not cover all PA dynamic range, but only a certain low-input region, the LS estimation is underdetermined. That means that there is no reliable way to ensure that the PA behavior described by the polynomials is accurate beyond that low-input range. Clearly, this may result in nonreliable DPD operation as soon as the input signal gets to amplitudes beyond the well-estimated PA regions. This implies that the BPC-LUT values obtained in such a case are not trustworthy. Therefore, the PA model estimation during the adaptation/update procedures has to be somehow re-engineered.
To avoid uncertainties, we performed a selective adaptation procedure in which only data buffers presenting input PA values above a certain power threshold were taken into account to perform the adaptation. Otherwise, data buffers were rejected and a new set of data buffers were recorded. In such a way, the PA model functions are estimated when the stimuli are complete enough in the sense that they cover a wide part of the PA dynamic range, thereby reducing the uncertainty and resulting in a reliable DPD operation. It is possible to dynamically adjust the threshold to tradeoff between accuracy and adaptation rate. A low threshold lowers the chances of data buffer rejection, but at the risk of under determination. Inversely, an excessive value for the threshold will result in a high buffer rejection rate, postponing the LUT update.
A more detailed explanation on the adaptation policy will be provided in Section V-C.
E. Assessment Metrics and Definitions
In our experiments, we continuously compare transmitter performances with and without DPD. When DPD is performed, we distinguish between memoryless DPD, when just one BPC is active, and memory compensation DPD, when several BPCs are active. In the latter case, we further specify whether nonrecursive BPC (BPC-FIR) or recursive BPC (BPC-IIR) are used. In concrete, when nonrecursive BPCs are used, they are denoted as " " BPC-FIR, with the " " being the number of nonrecursive LUTs used (ranging from 1 to , see Fig. 3 ). On the other hand, when recursive BPCs are used, they are noted as " " BPC-IIR, with "N" being the number of recursive BPCs used (ranging from 1 to , see Fig. 3 ). In the following, additional metrics and the criteria used are described.
The main metric to check the transmitted signal fidelity in the time domain is the error vector magnitude (EVM), defined as follows in (9) . The unmodulated (unfiltered) raw error between the baseband waveforms is computed taking into account all the available data within the 2048 samples I/Q data buffers (9) with and being the I/Q components of the reference baseband signal to transmit, and and being the I/Q components of the baseband PA output after downconversion. When DPD is not active, we rather use the most suitable linear transformation of ,
which pre-compensates gain mismatches and phase offsets associated to closed-loop misalignments and, thus, minimizes the numerator in (9) . When DPD is active, is expected to converge to , and no further prearrangement is necessary.
In the frequency domain, signal fidelity is observed as spectral regrowth on both sides of the RF carrier signal. When it applies, the single carrier 3GPP W-CDMA forward link ACPR conformance test [34] has been used; whereas in the remaining scenarios under test, direct spectrum inspection provided a measure of spectral regrowth as a framework of comparison.
To fairly assess the benefits of DPD, the PA output power must be the same among the considered scenarios under comparison. In the following measurements, a power meter ensures that comparisons are established between equal mean power signals. Furthermore, the power measurement, together with the dc power consumption, which is directly obtained from the measurement of the supply current, easily provides a reliable mean to compute the PA drain efficiency. To provide an insight into the contribution of DPD to the overall efficiency, the DPD power consumption has been considered as well. However, for these efficiency computations, PA bias voltages and currents are not taken into account.
V. EXPERIMENTAL RESULTS
Here we intend to assess the performances of the described predictive digital predistorter and the implemented FPGA architecture through experimental verification on the basis of the experimental setup and procedures stated above.
A. General Testing
A first set of measurements was performed without focusing on a particular transmission standard with the intention to evaluate the PA main unwanted effects and different DPD configurations. Fig. 5 shows the transmitted spectra of a 20-MHz bandwidth signal with 10 dB of PAPR and a mean output power of 12 W for the following cases: without any DPD, with memoryless DPD (one BPC), and with memory compensation (memoryless BPC two BPC-FIRs). The benefits of using DPD are shown in terms of out-of-band distortion reduction. In the time domain, the AM/AM characteristic provides additional information on the DPD operation, as is shown in Fig. 6 . It reveals a linearized AM/AM characteristic when DPD is applied, and moreover, dispersion is reduced when memory effects are compensated using three BPCs. This dispersion compensation in the AM-AM characteristic is directly translated in the EVM metric, as shown in Fig. 7 , where a significant amount of EVM reduction is achieved. Specifically, in Fig. 7 , the amplified signal constellation presents an EVM of 12%, which is slightly reduced when applying memoryless DPD compensation , and halved when applying DPD taking into account memory effects compensation, and thus achieving an EVM of 4%.
Note that the unlinearized AM-AM characteristic in Fig. 6 exhibits higher gain than the DPD linearized characteristic, although the peak amplitude levels with and without DPD meet at the PA saturation point.
Linear amplification with DPD can only be achieved up to saturation since no further correction is possible beyond that compression point. Therefore, the maximum available linear gain for the DPD PA chain has been experimentally tuned to be the ratio between the maximum PA output power and the 
This reasoning is graphically shown in Fig. 8 , where, despite that the overall gain is reduced with regard to the nominal PA gain , , DPD allows linear amplification up to the PA saturation point, while the mean output power is maintained since the histogram of the PA input signal is reshaped. Following this criterion, to perform fair comparisons between signals, ensuring that the mean output power is the same with and without DPD, one has to apply the following input backoff (IBO) to the unlinearized signal dB dB dB (12) This criterion has been respected in all results shown in this paper (except for illustration purposes in Fig. 6 ),thus avoiding any kind of makeup coming from a less unlinearized backed-off operation to exaggerate the actual DPD linearization performance.
Until now, we have shown how memoryless DPD fails to deliver appropriate levels of signal fidelity at the transmitter antenna because it is unable to properly compensate PA linear distortion. This has been mainly evidenced in terms of EVM, but also in terms of out-of-band distortion.
Indeed, linearization performance was improved by including means to compensate memory effects [i.e., additional BPCs in our NARMA-based DPD (see Fig. 3 )]. However, we have deliberately avoided focusing on the recursive BPC arrangements since this topic is developed in the following.
Assuming that memoryless DPD is insufficient, we now compare the linearization performance achieved when considering recursive and nonrecursive NARMA DPD arrangements. The following three configurations were confronted: 1) two BPC-FIRs; 2) three BPC-FIRs; 3) two BCP-FIRs one BPC-IIR. All considered configurations yield similar EVM figures (4%), but slight differences in the adjacent channel power ratio (ACPR) improvement. Fig. 9 shows the linearized power spectra of a 10-MHz filtered noisy signal (with a high PAPR aimed at statistically emulating a two-carrier W-CDMA scenario) when considering the aforementioned NARMA DPD configurations. As can be observed in Fig. 9 , the best ACPR is obtained by taking advantage of the recursive operation of the NARMA DPD (two BPC-FIRs one BPC-IIR). During our experiments, we found that, to ensure a reliable DPD performance, it is important to identify the BPC-LUT contents using a wideband signal capable of exciting the maximum number of memory states of the PA [32] . The use of a spectrally rich signal to train the DPD enables the maintenance of linearity performances when a signal, with a narrower bandwidth than the first, is applied later. In such a case, no additional training of the DPD is required and, thus, we obtain the desired independence on the specific signal applied [35] . This is an important feature to be taken into account in variable bandwidth transmission schemes such as WiMAX and other multicarrier configurations, where the signal statistics in terms of PAPR and bandwidth may not be known a priori. This is experimentally highlighted in Fig. 10 , showing the linearized power spectra of different RF signals and with different signal bandwidths: 20 MHz-12 MHz-8 MHz for both memoryless DPD and DPD with recursive memory compensation, respectively.
The DPD has been trained using the wider bandwidth signal (20 MHz) and this permits a robust DPD functioning with narrower signal bandwidths as is shown in Fig. 10 . Moreover, again, a better performance in ACPR reduction can be observed by using memory compensation in DPD (two FIRs one IIR four BPCs) than using a simple memoryless DPD, even without training between signal changes. Experimental results also show that if adaptation is performed with the reduced bandwidth signal, DPD performances are degraded when a wider signal is applied and, thus, further adaptation will be required.
B. Single Carrier W-CDMA Signal Test
To summarize the experimental results, we have considered here the linearization of a single-carrier W-CDMA signal. For that purpose, we have first estimated the LUT contents of the DPD with a 10-MHz noisy wideband signal, as in Fig. 9 , and thus, for different BPC arrangements, i.e., memoryless DPD, two BPC-FIRs, three BPC-FIRs, and two BPC-FIRs one BPC-IIR. Once the DPD has been trained for each considered configuration, and the corresponding LUTs have been stored into the PC memory, the adaptation procedures have been stopped.
To check the linearization performance achieved when a different signal than that used for the DPD identification is fed to the PA, Table I reports the measured results obtained when applying a 5-MHz 8-dB PAPR W-CDMA signal. Results are shown in terms of ACPR and EVM for all the BPC combinations considered above. For each arrangement, the suitable BPCs are activated and properly filled with the LUT values derived during the adaptation procedure. In Table I , for the sake of equivalent power comparison, BO operation has been also considered, with an IBO defined as in (12) . Fig. 11 shows the measured output power spectra for the DPD configurations previously mentioned. It clearly appears that from the EVM point of view, DPD with memory compensation is necessary to significantly reduce in-band distortion. Moreover, better ACPR reduction is achieved when considering more than two BPCs in the DPD and, among these solutions, the one combining two BPC-FIRs one BPC-IIR exhibits the best ACPR reduction.
C. Adaptation Procedure
In the LS estimation, the extracted solution at each estimation step depends only on the current data, as no information of the past state is explicitly introduced during the process. This can lead to momentary PA estimations much too dependent on the data from which the estimation has been performed, especially since the short 2048 data sample records may not be statistically representative.
To avoid this, a degree of recursion is included by producing the polynomial coefficient estimate as a weighted sum between the past estimation state and the estimation resulting from the current data. This issue may not be of concern when laboratory setups are used for delayed offline DPD [4] - [7] , [33] , where large acquisition capabilities may allow a one-step reliable estimation without the need of recursion.
The whole recursive estimation/adaptation procedure is illustrated in the flowchart shown in Fig. 12 . The current estimation state is represented by the tag . represents the LS solution for [see (8) ] attained at the th adaptation step, and is the recursion forgetting factor. Concurrently to the estimation, a continuous flow of data is being predistorted and transmitted with the current settings from which only a small fraction is taken into account for estimation purposes. By performing the adaptive procedure described here, a good adaptive behavior is observed while DPD reliability is reinforced. Moreover, the system converges very fast, as is shown in Fig. 13 , where the EVM evolution is tracked for each adaptation step, reaching a stationary state within 2-4 steps. The EVM, calculated from the unmodulated raw signal, of all DPD configurations taking into account memory effects present values around 4%-5%, while the memoryless DPD is not able to achieve EVM values lower than 11%. The robustness of the DPD can be affected by possible instabilities related to its recursive part. As is explained in [24] , a small gain test has to be performed in order to check the overall DPD stability. This test was performed during the preliminary PA characterization stages when identifying the optimal delays defining PA memory effects. It is necessary to ensure that nonlinear functions associated to recursive BPCs are bounded below a certain threshold that guarantees stability.
D. DPD Power Consumption
Here, we evaluate the DPD energetic cost measured over the presented FPGA implementation. Although the power consumption of digital circuits is strongly dependent on each particular implementation, target device (application-specific integrated circuit (ASIC) or FPGA), and technological CMOS parameters, the particular results shown here are aimed at assessing the relative DPD contribution to the overall transmitter energetic balance.
In FPGA devices, power consumption contributions are static and dynamic, both dependent on the supply level, as stated by the classical CMOS power consumption approximation rule (13) Static power consumption is due to leakage currents in the FPGA transistors, and depends mainly on the device size only. Dynamic power consumption, due to gates being switched between low and high logic states, depends on the number of gates within the design , which, in our case, depends on the number of BPCs. For each gate, consumption depends on its activity profile , clock frequency , and load capacitance . In our measurements, a transition profile for the involved DPD signal vectors has been considered. Accidentally, because the nonlinear functions are mapped into the BPC LUTs, DPD consumption does not depend on the polynomial degree of the PA estimator, but on the number of BPCs. The following results on DPD power consumption have been obtained with Xilinx Inc.'s Xpower utility. In a first attempt, the measurements are performed over the placed and routed design of the DPD core only, and do not include the remaining non-DPD-related logic included in the FPGA device (mainly devoted to communications and data exchange with the PC).
The DPD core power consumption depends on the DPD clock and the number of BPCs. At 105-MHz DPD clock frequency, an increase of 36 mW per BPC is reported in [11] , whereas at 50 MHz, the ratio is 21 mW per BPC. Note that increasing the BPC count results in a relative low power increase when the one-BPC case is taken as a reference. This is due to the different supply domains within the FPGA device [31] . Most of the computing intensive DPD logic is placed in low supply internal banks (1.2 V), where furthermore is low, thus having little contribution to dynamic consumption in (13) . On the contrary, most of the power consumption is dominated by a few signals switching in and out of the DPD core, mainly the I and Q predistorted data vectors feeding the D/A converters because of the higher supply (3.3 V) and load capacitances.
To provide a qualitative framework of the overall DPD energetic cost, Table II reports the main contributions to power consumption in the proposed DPD design. Clearly, the adaptive functionalities are the main sources of power consumption: A/D converters, non-DPD-related FPGA logic, and the adaptation algorithm executing in a PC or DSP. Nevertheless, it is possible to reasonably neglect its contribution during regular DPD operation when for most of the time no adaptation has to be performed and, hence, only the DPD-related FPGA logic is then active. Another contribution not shown in Table II may be considered since the predistorted signal bandwidth exceeds that of the original signal. The higher sampling rates required in the D/A converters increase their power consumption. Nevertheless, that contribution is worthy because a system without DPD would exhibit a much worse overall efficiency than a system with DPD if linearity has to be guaranteed (see Table III ). To recapitulate, the DPD energetic cost can be perceived as almost negligible in high power applications where the PA power capabilities exceed tenths of watts, as is the case in the presented studies. In view of this and given the fact that DPD may be unavoidable to counteract memory effects, one can consider degrading the PA operation point in order to increase its efficiency. The consequent lack of linearity will be compensated by the DPD.
VI. DPD AS ENABLER TO IMPROVE PA EFFICIENCY
DPD linearization techniques are widely recognized as enablers of PA efficiency. By extending the usable dynamic range of a PA in a linear manner (up to its compression point), DPD implicitly contributes to efficiency by avoiding the use of an oversized, more backed off, less efficient, and alternative PA device to produce the desired linear output power. This reasoning is illustrated in Table III , presenting the measured linearity and efficiency figures when amplifying a single W-CDMA carrier with and without DPD for the same experimental setup as noted in the above sections. It is possible to observe that the PA delivering a certain amount of RF power (42 dBm) without linearization consumes less than the DPD linearized PA delivering the same RF output power.
Although this result may seem contradictory since the nonlinearized PA appears to be more efficient than the linearized DPD, the ACPR figures show how this misleading efficiency improvement is obtained at the price of having poorer linearity, and thus, no comparison can be established.
Therefore, if we consider the compliance with certain standardized levels of ACPR (e.g., 44 dB) as a reference for comparison, it is clearly noticed how the PA without linearization has to operate with significant BO, dramatically reducing its efficiency. Moreover, its output power capabilities are reduced by a factor of approximately 3 (5 dB).
Besides, there is another common way in which DPD is explicitly used as an efficiency enabler: varying the overall linear gain (see Fig. 8 ) and assuming that certain level of signal clipping can be tolerated. That is, considering a signal for which the peak power is rarely reached, it is possible to increase the overall linear gain , and thus, the output power and efficiency. This will result in having linear amplification until compression, and on the rare signal peak occurrences in which the PA is saturated, the energy contribution to the average power spectral density will be negligible as long as the clipping probability is kept small. In the following, we address the possibility to exploit DPD as an efficiency enabler. Given the fact that DPD is recommendable, at least to counteract memory effects in the time domain, it may seem reasonable to think of adjusting the PA quiescent point in order to increase its efficiency; e.g., to turn a class-AB PA toward class-B-like operation, and then let the DPD compensate for the linearity degradation originated when changing the quiescent point.
As depicted in Fig. 14 (top) , the AM-AM characteristic of the PA presents an added nonlinear distortion related to crossover distortion, superposed to the dispersion originated by memory effects that cannot be corrected for with a memoryless DPD strategy. However, the NARMA-DPD with six BPCs is capable of linearizing the crossover characteristic and reducing the scattering present in the AM-AM characteristic as well [see Fig. 14  (bottom) ].
As expected, in the class-B operation mode, the PA is less power consuming. Therefore, for a given output power level (i.e., 40.5 dBm) and by means of the DPD, it is possible to achieve the same linearity level dB provided by the PA in class-A mode of operation at the time that efficiency is improved, as is depicted in Fig. 15 .
Clearly, this quiescent point manipulation is limited by the progressive maximum output power drop as the quiescent point moves towards class-B operation. Nevertheless, the study presented here shows how DPD can successfully counteract the excess of nonlinearity, suggesting that DPD can be coupled to variable biasing strategies to boost the PA efficiency, e.g., during periods where the maximum nominal output power is not solicited.
From the DPD point of view, this could be simply performed by downloading into the BPC-LUTs the appropriate gain values corresponding to each particular bias point, and when appropriate, switching on/off BPCs to satisfy the desired memory effects' compensation span.
VII. CONCLUSION
This paper has presented an experimental validation of the NARMA-based DPD using a reconfigurable FPGA board. The experimental results have shown the linearization capabilities of the proposed NARMA-based DPD over a wide range of signal bandwidths and independently of the modulated signal used; highlighting the potential of the proposed recursive DPD architecture over the more usual nonrecursive DPD approaches.
Practical design issues and real-time DPD hardware implementation topics have been also tackled. Among them, this paper has proposed the concept of scalable FPGA DPD implementation by replication of BPCs, as well as an iterative adaptation process for signals with high PAPR and limited data recording capabilities. Indeed, it has been shown how the training of the DPD with a spectrally rich wideband signal provides stability and reliability despite the specific signal to be predistorted during regular operation.
This study has also focused on the study of the power consumption of the DPD implementation, concluding that the DPD contribution to the overall efficiency may be negligible in front of the PA consumption and that of the devices deployed for adaptation purposes.
Finally, considering that the inclusion of the DPD is necessary to provide transmitted signal fidelity against memory effects, we have explored the possibility of biasing the PA in a power-efficient quiescent point, showing how the added nonlinearity resulting from that power efficient polarization can be compensated by the DPD, therefore improving the overall efficiency at no extra cost. 
