Abstract-
deep-submicrometer CMOS technology is optimized for performance as a switch for digital logic circuitry [1] . Moore's law scaling of CMOS optimizes transistors to operate as highspeed low-loss switches rather than high-gain transconductors [2] . Hence, it is advantageous to use transistors as switches in switching PAs and use high-speed digital logic circuitry to implement linearization systems and circuitry.
Linearization techniques for switching amplifiers have been intensively investigated to enable the operation of switching amplifiers for nonconstant envelope (non-CE) modulation. Techniques such as analog/digital polar PAs [3] [4] [5] [6] [7] [8] [9] , digitalDoherty [10] , outphasing [11] , and pulsewidth modulation [12] , [13] have been proposed. All of the techniques above require a conversion from the Cartesian to the polar coordinate system, via either analog or digital signal processing. Polar architectures result in bandwidth expansion of the amplitude and phase modulated signals due to the nonlinear conversion from the Cartesian to the polar domain. Additionally, because the amplitude and phase modulation signals propagate at different frequencies, they experience different group delays and are hence subject to delay mismatches [14] . Moreover, they require wideband amplitude and phase modulators, the latter of which are difficult to implement in closed loops [e.g., phase-locked loops (PLLs)], due to the excessive bandwidth, which can be ten times the RF bandwidth [15] . Open-loop phase modulators can be implemented with wide bandwidth, but they introduce phase quantization, digital-tophase nonlinearity, and spectral images and require careful calibration [8] .
Quadrature architectures avoid the bandwidth expansion and delay mismatch, as well as the elimination of both the supply modulator and phase modulator [16] [17] [18] . The expanded bandwidth of the quadrature generation does not need to be modulated in a closed loop and does not propagate at a different frequency than the envelope. Hence, nonlinearity associated with signal bandwidth and delay mismatch is mitigated, while still accommodating a good interface to the digital backend. However, the power loss due to the out-ofphase summation of the I and Q signals significantly limits their peak/average power and efficiency. One solution is to reduce the separation between the summed adjacent phases, which reduces the loss due to the vector summation. This can be accomplished by adding basis phase vectors in the complex plane [19] , [20] .
In this paper, we present a multiphase switched capacitor PA (MP-SCPA) for non-CE amplification [21] . A block diagram of an MP-SCPA is shown in Fig. 1(a) . In the proposed architecture, a single clock is used to generate M different phases. This can be accomplished using an MP ring oscillator, polyphase filters, or delay-locked loops (DLLs). Each phase can be weighted and summed in the charge domain in an SCPA to output the desired amplitude and phase [ Fig. 1(b) ]. MP architectures increase the output power compared with quadrature architectures by reducing the angular separation of the basis phases. Because the phase generation and logic are low power, it is a cost effective means to improve the output power and efficiency of digital transmitters. Like in a quadrature architecture, the expanded bandwidth of the phase generation does not need to be modulated in a closed loop and hence does not suffer the problems associated with polar architectures.
This paper is organized as follows. In Section II, theoretical operation of the MP-SCPA is discussed. The design details of the presented MP-SCPA are provided in Section III, followed by the measurement results in Section IV. Finally, conclusions are presented in Section V.
II. THEORY OF OPERATION

A. Operation of Q-SCPA
The quadrature SCPA (Q-SCPA) is an example of an MP architecture, where the number of phase vectors M is equal to four [ Fig. 1(a) ] [17] . The capacitor array is subdivided into four subarrays that are clocked by ±I and ±Q, respectively. Selection logic (e.g., bin) controls whether the cell is either switched between V DD and V GND or held at V GND .
This operation allows precise control of the output amplitude and phase, using a vector sum in the charge domain on the top plate of the capacitors in the SCPA array. The input upconverting clock pulses for the I and Q capacitor subarrays are 90°out of phase. The output signal s(t) is the direct summation of I (t) and Q(t) waveforms
where T is the carrier period, and p(t) and p(t + T /4) represent the input 50% duty cycle square waveforms. The I (t) and Q(t) signals are given by
where A(t) and φ(t) are the amplitude and phase of the modulated signal, respectively. When the magnitudes of I and Q are equal, the maximum amplitude of the summation of the two vectors is given by
where V θ is the amplitude of the I /Q vector and P is the phase angle separation between the two vectors. For a quadrature system, P = π/2. Compared with a polar system, the maximum amplitude of the system is V θ . Hence, a power ratio comparing a quadrature system with a polar system P rel can be calculated as follows:
Thus, the output power of a quadrature system is reduced by 3 dB when compared with the vector summation of two signals that are in phase (e.g., polar). Hence, there is a phase-dependent power drop that is caused by the 90°phase difference of the I /Q clocking signals.
For an OFDM modulated signal, the phase is uniformly distributed between 0 and 2π, and hence the average power drop can be found due to all combinations of IQ summation. The output phase is given by φ and the output amplitude is given by A 0 . In an SCPA, the output amplitude is proportional to the number of the capacitors that are switched. The ratio between the amplitudes of digital polar SCPA and Q-SCPA, V rel , for any output phase φ is thus expressed as follows:
It is noted that this result derives from the fact that both phases in the quadrature system can be used to switch the same capacitor bank. The average ratio from 0 to 2π is calculated as follows: Similarly, the average output power ratio is calculated
Hence, for an OFDM modulated signal, the output power of a Q-SCPA is 2 dB lower on average than that of a polar SCPA. This loss of output power can be alleviated by increasing the number of basis phases M to be greater than 4. The MP-SCPA is discussed next.
B. Operation of the MP-SCPA
An example of MP summation is shown in Fig. 1(b) using eight clock phases. Additional clock phases reduce the separation between adjacent phase vectors, resulting in the increased constructive summation between any adjacent phase vectors and hence reducing the power loss.
The circuitry to generate precision MP signals and the digital logic to implement the MP signal processing were difficult to implement when polar amplification was introduced by Kahn [3] . Such circuitry is readily available in modern fine line CMOS processes. Deep-submicrometer CMOS technology is fundamentally a digital technology, as it is optimized to yield fast low-loss switches. The MP architecture leverages the strengths using low-cost and low-power digital circuitry to reduce the power loss and hence increase efficiency in high-power switching PAs.
As shown in Fig. 2 , an arbitrary vector with amplitude A and phase θ can be converted into the MP domain using the following:
It is noted that there are multiple ways to perform this conversion due to the lack of orthogonality in the basis vectors and hence more than one solution exist. The scalars n 1 and n 2 represent the amplitudes of the two adjacent phase vectors to the desired output θ , and m is an integer representing the index of the selected phase vector that is determined using the following:
Rearranging (12) yields the following:
To quantify the efficiency improvement of the proposed MP architecture, the relative power of a MP system compared with that of a polar system P rel,MP can be calculated for any output phase φ
The maximum power loss occurs when φ = π/M, and is given as follows:
Hence, the maximum power loss for an eight-phase MP architecture is only −0.69 dB, whereas a 16-phase system yields a power loss of −0.17 dB. The output voltage ratio comparing an M-phase architecture and a polar architecture for any output phase, V rel,MP , can be calculated as follows:
The average for a uniformly distributed phase signal over the range from 0 to 2π is given by
where C = (2 − 2 cos(2π/M)) 1/2 . Similarly, the average output power ratio is calculated as follows:
The average power drop for an OFDM modulated signal is reduced to −0.46 dB for an 8-phase system and −0.11 dB for a 16-phase system, respectively. The SCPA has good linearity and constant impedance with respect to the load, and hence it is a good candidate to implement the MP architecture. In an SCPA, an array of capacitor cells has a shared bottom plate, whereas the top plates are separate and driven by a phase-modulated pulsewave, switching between V DD and V GND or held at V GND to control the output amplitude. To accommodate the MP operation, an RF clock signal is subdivided into M equally spaced output phases (φ 1 -φ M ). Such signals can be generated using ring oscillators, polyphase filters, PLLs, or DLLs [22] , [23] . A digital phase selector selects the chosen phase to drive each capacitor. An example of an MP-SCPA is shown in Fig. 1(a) . In the example, each unit SCPA is clocked by one of M phases and a decoder selects the amplitude weight of each phase to achieve the desired output vector. It is noted that the actual implementation is slightly different, as will be described in Section III.
1) MP-SCPA Output Power:
In order to facilitate transitioning from Cartesian to MP architectures and vice versa, a set of transformations is derived that allow easy mapping from one coordinate system to the other. As shown in Fig. 3 , the I and Q components of an MP vector can be expressed as follows:
and
where n 1 and n 2 are the amplitudes of the two selected phase vectors that are adjacent to the desired output phase and m represents the index of the selected phase vector that is determined by (13) . From (19) and (20) , the Cartesian to MP conversion can be obtained as follows: and
The output voltage of the MP PA is proportional to the vector summation of the n 1 and n 2 components across an array of
This can be seen as a Thévenin equivalent voltage driving a fixed capacitor, and hence a Thévenin equivalent circuit replaces the array of capacitors and is connected in series with a resonant inductor and an optimum resistance for power delivery, as shown in Fig. 4 . Assuming that a square waveform is input with an amplitude of V DD , but only the fundamental tone is output due to the bandpass operation of the series resonant circuit, the output voltage delivered to R opt is found by
where the 2/π factor is the Fourier coefficient of the fundamental tone of the input square pulse wave. The output power can be extracted from the rms value of the output waveform as follows:
Here, it can be seen that as M tends toward infinity (e.g., a polar modulator) and the MP-SCPA behaves like the original SCPA, as expected. The efficiency of the SCPA can be calculated by finding the input power required to switch the capacitor array. 2) MP-SCPA Efficiency: The input power of the SCPA is due to the energy required to charge and discharge the capacitors in the array [17] , [24] . Assuming that the inductor (Fig. 5) acts as an RF current source, it can be treated as an open circuit; hence, the equivalent capacitance C in is the series combination of the capacitors being switched with those not being switched. Note that this must be done for the capacitors being clocked on phase m (e.g., n 1 ) and those being clocked on phase m + 1 (e.g., n 2 )
The input power P SC is thus the power required to switch C in
where f 0 is the switching frequency used to drive the switches in the capacitor array and is equivalent to the RF output frequency. The loaded quality factor of the series resonant network comprising the capacitor array, the series inductor, and the load resistor is given by
From this, the drain efficiency η of the SCPA can be defined as
Substitution of (25)- (28) into (29) yields the following:
For on-chip matching networks, reasonable values of Q nw must be chosen due to losses in the passive components (e.g., spiral inductors). If it is assumed that n 1 = n 2 and Q NW = 3, η is plotted versus the number of switching capacitors and phase vectors, M, as shown in Fig. 6 . It is observed that there is a significant increase in efficiency when increasing the number of phases from M = 4 to M = 8, but not a significant jump when increasing beyond 8. This is primarily due to the added output power. Another factor in choosing M is that the charge should settle on each capacitor that is switched on phase m before the beginning of phase m + 1; with the increasing number of phases, there is less time for settling and hence interaction between the phases would cause nonlinearity. In our design, M = 16 phases was chosen based on the settling limits and the lack of significant improvement in output power and drain efficiency.
3) Additional Loss Mechanisms: Other losses that affect the overall efficiency (e.g., matching network, voltage division due to switch resistance, parasitic driving power, and clock distribution) are similar to other SCPA implementations [9] , [17] , [24] . They can be accounted for in the SE calculation, with the addition of terms to capture the voltage division, matching network losses, and additional dc power consumption as follows:
where α is the voltage division due to switch resistance, β is the attenuation in the matching network, and P DR , P clock , and P misc are the power consumed by the drivers, clock distribution, and all other sources (e.g., pad buffer power, parasitic charging, and logic power), respectively. The attenuation terms can be accounted for using the schematic shown in Fig. 7 . The term α can be found from the voltage division across the switch resistance where r sw is the resistance of the output switch. In the design, R opt = 2.25 and r sw = 1 , leading to α = 0.69. Similarly, β is found by calculating the attenuation in the matching network [25] 
where Q L1 and Q L2 represent the quality factors of inductors L 1 and L 2 , respectively, and Q NW represents the quality factor of the network. The quality factors are found through simulation, Q L1 ≈ Q L2 ≈ 15 and Q NW ≈ 2.5, yielding β = 0.71. The simulation also finds that the total power for all dc terms in the reference design to be discussed in Section III is 120 mW, including all pad drivers and buffers, the clock receiver, and distribution network; in an SoC implementation, the input power would be reduced significantly. The modified SE for an MP-SCPA outputting ∼26 dBm with the aforementioned loss factors is plotted versus input code in Fig. 8 .
III. CIRCUIT DETAILS
A. Top Level of the 7-b MP-SCPA
The block diagram schematic of the prototype 16-phase MP-SCPA is shown in Fig. 9 . Note that a single-ended version is shown although the fabricated circuit is differential. A 7-b unary switched capacitor array is implemented in this design, which is adequate to meet the error vector magnitude (EVM) and close-in out-of-band (OOB) noise specifications for wireless communication standards such as LTE and IEEE 802.11 (e.g., Wi-Fi) [14] , [15] .
In the proposed architecture, an off-chip phase generator creates 16 evenly distributed phase vectors (φ 0 -φ 15 ) that are input to a clock selection MUX. The MUX is implemented with pass transistor logic. Four bits from a digital pattern generator control digital logic that is used to select the two adjacent phases (φ A -φ B ) to the phase of the desired output signal. The phases φ A -φ B are distributed via a clock distribution network to every cell of a 7-b unary capacitor array. This allows any cell to be switched by either φ A or φ B , accommodating cell reuse and allowing the output vector to be steered fully toward either basis phase [16] , allowing for larger peak and average amplitude. A 14-b MP logic decoder is used to select whether each capacitor is to be switched by φ A or φ B or to be ground to V GND . This allows precise control of the output amplitude and phase. The circuit details of the individual blocks are now discussed.
B. Unit Cascoded SCPA
The switch is a cascoded inverter (Fig. 9 ) that allows for operation with a doubled supply voltage to increase the output power and reduce losses in the output matching network, by reducing the magnitude of the impedance transform required [25] [26] [27] . For each cell, a level shifter and separate buffer chains are implemented to drive the high side and low side of the switch. To avoid conduction loss, nonoverlapping signals are generated for the switch drivers [24] .
C. Phase Selector and Amplitude Decoder Logic
The phase selector comprises a decoder and a MUX tree. The 16-phase clocks are buffered on chip and then input to a pass-transistor MUX. A synthesized decoder controls the MUX and selects two clocks with adjacent phases (φ A and φ B ) to the output. A well-matched pair of inverter chains buffer and drive the two selected clocks to each cell of the MP-SCPA, where the amplitude decoder can select whether the capacitor cells are to driven by either φ A or φ B or to be held at ground.
The amplitude decoder logic consists of two sets of thermometer decoders. The first 7-b thermometer decoder decides how many cells n 1 are switched by phase φ A . The second 7-b thermometer decoder decides whether the balance of the cells n 2 are switched by phase φ B or held at ground. The amplitude decoder is implemented in Verilog and synthesized.
The phase and amplitude decoders are both clocked at the same sampling frequency (e.g., 200 Msamples/s) and their data paths are closely matched through careful signal routing. The data are latched after being input to the chip and again before it is input to the decoders. Because of this, the delay mismatch in the paths is minimal, compared with polar DPAs, where the phase signal propagates at the RF carrier frequency and the amplitude signal propagates at the sampling rate, resulting in a delay mismatch. Hence, this is the primary benefit of MP architectures. Another way of viewing this is that the amplitude selector selects only the relative phases for combination.
IV. EXPERIMENTAL RESULTS
The prototype MP-SCPA is fabricated in a 130-nm RF CMOS process with ultrathick top metal for high-quality passives. The chip microphotograph is shown in Fig. 10 . It occupies a total area of 2.1 mm × 1.8 mm, dominated by the I/O pads for supply and data; SoC implementations would reduce size. A 4-b phase selection decoder (m) selects the two clock phases adjacent to the desired output, whereas two 7-b decoders (n 1 , n 2 ) independently control the capacitors that are switched or held at ground. Most circuitry operates from a supply voltage of 1.5 V, with the exception of the cascoded output switch and their drivers that operate from 1.5 to 3 V. Fig. 11 is the measured static output power P out and total SE versus frequencies. It is noted that the total SE is the ratio of the delivered output power to all input (dc and RF) power coming onto the chip. The PA delivers a maximum output power of 26 dBm with an SE of 24.9% at 1.82 GHz. The measured −3 dB bandwidth of the PA is around 750 MHz, which is consistent with the loaded quality factor of the bandpass matching network.
A. Static Measurements
Shown in
The output power is plotted as a function of the digital input code in Fig. 12 . The digital input codes n 1 , n 2 , and m are mapped to IQ plane using (19) and (20) . The nonlinearity observed at higher output power is caused by the bondwire inductance and the phase offsets of the adjacent clocks. Excess bondwire inductance causes supply and ground bounce and affects the dynamic response at the output in the form of memory effects. Additionally, the difference in the clock's duty cycle and rise/fall time can result in mismatch in the output amplitudes along different phase vectors.
The measured phase and amplitude distortions (AM-PM and AM-AM) are shown in Fig. 13 . It is observed that a relatively larger distortion occurs at high power, which can be explained by the interaction between supply/ground bounce and clock duty cycle mismatch. When the output power is high, the effect of the clock differences enlarges the supply and ground bounce, resulting in a larger difference in output amplitudes switching at different clocks.
B. Digital Predistortion
Nonlinearity in PAs causes spectral mask violations, increased bit error rate resulting from the in-band distortion, and adjacent channel interference due to spectral regrowth. Owing to the low cost and low power of CMOS digital circuitry, it is cost effective and advantageous to linearize the PA using digital predistortion (DPD) at the baseband. In low inductance packages (e.g., flip-chip), the need for DPD in SCPAs can be almost entirely mitigated. However, in packages where supply is provided through a bondwire, the settling behavior of the supply can create distortion; this is true of almost all PAs. In the presented design, the wirebond packaging dictates the need for DPD.
Similar to quadrature DPAs [16] , [18] , [28] , [29] , the MP-SCPA requires a 2-D DPD. This can be done in the Cartesian or polar domain. In the presented MP-SCPA, the digital input code (e.g., n 1 , n 2 , and m) are mapped to the measured output amplitude and phase. To visualize the mapping, both the digital input code and the output amplitude and phase are converted into Cartesian coordinates and plotted in the complex plane. The measured output with all codes using two adjacent phases are plotted in Fig. 13(a) , whereas all possible combinations are plotted in Fig. 13(b) .
These measurements are the basis of a lookup table (LUT) that is used to build a digital predistorter. It is noted that such an LUT-based DPD can be very large for all possible combinations of digital input codes. The predistorted input codes are converted into the Cartesian domain and several examples of the predistorted input and linearized output are plotted in Fig. 14(a) and (b) , respectively.
A large LUT occupies significant memory, especially when the LUT must be found for different frequency and operating temperatures. Such large LUTs occupy significant die area for memory. To overcome this disadvantage, a 2-D polynomial surface is fit to the measured data. Assuming that I DPD and Q DPD are independent of each other, then they can be expressed as a polynomial function of the in-phase and quadrature inputs, respectively
Equations (34) and (35) represent two different continuous surfaces in a 2-D coordinate system, as shown in Fig. 15 . By fitting the surface to the measured data, the desired coefficients a k, j −k and b k, j −k are obtained. A third-order polynomial expression is determined and it is plotted as the red surface in Fig. 15 . Predistorted codes from the 2-D surface fit and the 2-D LUT are compared and plotted in Fig. 16(a) and (b) for the I DPD and Q DPD , respectively. There is a close fit since the surface fit polynomial is constructed from the LUT.
The polynomial DPD saves die area and complexity compared with the LUT. It can also accommodate the effects due to operation temperature and frequency and the memory effects by changes to the polynomial coefficients. To demonstrate the effectiveness of the 2-D surface fit DPD, the output amplitude is plotted in a contour after DPD in Fig. 17(a) . It shows a linear response across all possible codewords and phases. The corresponding SE contours are plotted in Fig. 17(b) . 
C. Dynamic Measurements
To verify the MP-SCPAs performance with modulated signals, it is tested with a 10-MHz, 64 QAM LTE signal. Without DPD, the ACLR is ≈ −20 dBc as shown in Fig. 18(a) . By applying the DPD to linearize the PA, the measured ACLR is less than the specified −30 dBc, as shown in Fig. 18(b) . The average output power is 20.9 dBm with a total SE (including pad buffers and all internal circuitry) of 15.2%. The measured EVM is ≈3.5%rms after DPD, whereas the EVM is >10%rms without DPD. The signal constellation is plotted in Fig. 19 .
The far-out OOB noise of the MP-SCPA is dominated by signal quantization. The far-out OOB noise for the 7-b Q-SCPA when transmitting a 10-MHz, 64 QAM LTE signal is plotted in Fig. 20 . The OOB noise can be suppressed with higher resolution. The alias at 2.02 GHz is due to the 200-MHz sampling rate of the input LTE signal and it can be further suppressed with a higher sampling rate. The sampling rate and resolution in our design were limited by the instruments available for testing and the pad count available on the die.
V. CONCLUSION
The concept of MP modulation is introduced and implemented in a prototype MP-SCPA in a 130-nm CMOS. It leverages the advantages of digital PAs while not requiring the wideband phase modulator (e.g., polar DPAs) or having high combining loss (e.g., quadrature DPA). This PA delivers a peak P out of 26 dBm at 1.82 GHz with 24.9% SE. The performance is validated from the static and modulation measurements using a 10-MHz, 64-QAM LTE signal. A 2-D surface fit DPD method is proposed to save die area occupied by a large LUT and to accommodate temperature coefficients and frequency compensation in the future. With DPD, the ACLR is below E-UTRA required −30 dBc LTE standard and the measured EVM is 3.5%rms. A comparison with the prior art is provided in Table I . The MP-SCPA is compared with prior quadrature DPAs and a complete polar transmitter, since the MP-SCPA is a digital transmitter frontend. Though this circuit was implemented in 130-nm technology, it achieves a higher average P out and a higher SE than a quadrature variant in a finer line technology [17] . It should be noted that for the Q-SCPA and MP-SCPA, the SE accounts for all power sources input to the chip, excluding the local oscillator generation.
