## A CMOS Broadband Power Amplifier With a Transformer-Based High-Order Output Matching Network Hua Wang, Constantine Sideris, Member, IEEE, and Ali Hajimiri, Fellow, IEEE Abstract—A transformer-based high-order output matching network is proposed for broadband power amplifier design, which provides optimum load impedance for maximum output power within a wide operating frequency range. A design methodology to convert a canonical bandpass network to the proposed matching configuration is also presented in detail. As a design example, a push-pull deep class-AB PA is implemented with a third-order output network in a standard 90 nm CMOS process. The leakage inductances of the on-chip 2:1 transformer are absorbed into the output matching to realize the third-order network with only two inductor footprints for area conservation. The amplifier achieves a 3 dB bandwidth from 5.2 to 13 GHz with +25.2 dBm peak $P_{\rm sat}$ and 21.6% peak PAE. The EVM for QPSK and 16-QAM signals both with 5 Msample/s are below 3.6% and 5.9% at the output 1 dB compression point. This verifies the PA's capability of amplifying a narrowband modulated signal whose center-tone can be programmed across a large frequency range. The measured BER for transmitting a truly broadband PRBS signal up to 7.5 Gb/s is less than 10<sup>-13</sup>, demonstrating the PA's support for an instantaneous wide operation bandwidth. Index Terms—Broadband, CMOS, high-order output matching, impedance transformation, instantaneous bandwidth, optimum load impedance, power amplifier, transformer. #### I. INTRODUCTION MOS technology offers a powerful platform for realizing a full radio system on a single chip with its unparalleled integration level and extensive digital processing capability. However, power amplifier, which greatly affects the entire transmitter's power efficiency and output signal quality, still remains one of the most challenging blocks for full transceiver implementation in CMOS. Due to the limited device breakdown voltage, the output matching network for a CMOS PA requires a large impedance transformation ratio to generate high output power. This often results in prohibitive passive loss and complicates the design process. In addition, besides conventional PA metrics, such as output power and efficiency, various emerging applications may pose further requirements on the PA's operation bandwidth. Modern Manuscript received April 27, 2010; revised August 11, 2010; accepted August 12, 2010. Date of current version December 03, 2010. This paper was approved by Guest Editor Kari Halonen. The authors are with the California Institute of Technology, Pasadena, CA 91125 USA (e-mail: hwang@caltech.edu). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JSSC.2010.2077171 communication systems often demand a large instantaneous bandwidth to support ultrahigh data rate modulation. Applications such as smart antennas and cognitive radios require transmitting at programmable center-frequencies within a large bandwidth to achieve frequency multiplicity [1], [2]. Furthermore, advanced radar imaging and biomedical sensing/imaging systems need the power stage to amplify truly broadband signals, such as pulses for target detections [3], [4]. For example, a chirped radar imaging system utilizes the correlation between the incident and the reflected pulses. The system's spatial resolution is determined by the correlation function's temporal width, which is inversely proportional to the total bandwidth of the chirped signal [4]. In summary, all these applications require significant extension of a PA's operating bandwidth well beyond the conventional narrowband practices. Common approaches for broadband PAs include distributed amplifiers and balanced amplifiers. In traditional distributed amplifiers, because the voltage waveforms from each stage are added in-phase along the output transmission line, the final amplifier stage generally experiences the maximum voltage swing and enters saturation first, limiting the total output power and posing reliability challenges. Moreover, a significant amount of power will be dissipated at the output termination, leading to a poor overall efficiency. The balanced amplifier, on the other hand, requires 90° couplers at both its input and output, which are generally difficult to realize for a broadband and low-loss on-chip implementation. Therefore, it is desired to develop new design approaches for broadband power amplification with high power efficiency and good signal fidelity. This paper presents a broadband PA topology with a transformer-based high-order output matching network. In addition to performing the differential to single-ended power combining, the network converts the 50 $\Omega$ load to the optimum PA load impedance over a large bandwidth to maximize the output power. A design method to convert a canonical third-order bandpass network to the proposed output matching is described in detail. The design space constraints for the matching network implementation, trade-offs between design specifications, and the extension to a high-order network are also presented. The paper is organized as follows. Section II presents a simplified model to describe the behavior of the optimum load impedance for a power device across a broad frequency range. Section III introduces the broadband transformer based high-order output matching topology and presents the design method starting from a canonical bandpass network. A 5.2–13 GHz deep class-AB PA implemented in a standard Fig. 1. Optimum load impedance to maximize the output power for an ideal Class-A PA. 90 nm CMOS process will be demonstrated as a design example in Section IV with its measurement results shown in Section V. To the authors' best knowledge, the presented PA design achieves the highest output power and PAE with a state-of-the-art instantaneous bandwidth among all the CMOS broadband PAs reported to date. #### II. THE BROADBAND OPTIMUM LOAD IMPEDANCE For a given power device and technology, the maximum output voltage (V<sub>max</sub>) is constrained by the device's breakdown and degradation voltage, while the maximum output current is limited by the device size and the input voltage drive. To maximize the output power from a given device, specific load impedance should be presented at the device output (Fig. 1) [5]. Therefore, the output matching network transforms the standard 50 $\Omega$ load to this desired complex impedance. Assuming class-A operation, for maximum output power, this impedance presents an equivalent inductance $L_P$ to resonate with the device nonlinear output capacitance $C_{device}$ at the operating frequency $\omega$ and a parallel resistive part $R_L$ as the optimum load, determined by the following two equations: $$j\omega L_P = \frac{-1}{j\omega C_{device}}$$ $$R_L = R_{opt} \approx \frac{V_{max} - V_{knee}}{I_{max}}$$ (2) $$R_L = R_{opt} \approx \frac{V_{max} - V_{knee}}{I_{max}} \tag{2}$$ where $V_{knee}$ represents the finite knee voltage of the power device. The optimum load impedance is thus dependent on the operating frequency. Assuming zero knee voltage, the three time-domain plots in Fig. 1 demonstrate the voltage-current waveforms for different R<sub>L</sub> values. The voltage swing and the current swing are simultaneously maximized for maximum output power, only when $R_L$ equals $R_{opt}$ . This optimum load impedance is normally determined through large-signal load-pull simulations or measurements. For a broadband power amplifier design, it is desirable to provide this optimum load impedance across the entire bandwidth to harvest the maximum device output power. Since the device $R_{opt}$ is determined by the breakdown voltage, knee voltage, and maximum output current, while $C_{device}$ is based on the device parasitic capacitance and the operation mode, to the first order, they can be assumed to be independent of the operating frequency as [6] $$R_{opt}(\omega) = R_{const} \tag{3}$$ $$C_{device}(\omega) = C_{const.}$$ (4) The broadband optimum load impedance is then given by a constant load resistance in parallel with an equivalent negative capacitance, as $$Z_{opt}(\omega) = R_{const} || \left( \frac{-1}{j\omega C_{const}} \right).$$ (5) To verify this simplified model, a large signal load-pull simulation for maximum output power is performed for a 90 nm CMOS cascode stage from 5 GHz to 14 GHz (Fig. 2). The resulting optimum impedances follow a constant conductance trajectory on the Smith chart. Therefore, to achieve a high power broadband PA, the main challenge is to design the output matching network, which transforms the standard 50 $\Omega$ to the device's optimum load impedance across the operating frequency range. #### III. BROADBAND OUTPUT MATCHING NETWORK REALIZATION In this section, we will present a design procedure, which starts from a canonical bandpass network and leads to a transformer-based PA output matching network to provide the desired load impedance across a wide frequency range. Instead of performing numerical optimizations [7] or direct syntheses [8], the proposed procedure places emphasis on physical design intuition. Considering a generic third-order doubly-terminated bandpass network (Fig. 3(a)), at plane A, the impedance to its right $Z_A(\omega)$ must be complex conjugate of its left-side impedance as $$Z_A(\omega) \approx R_0 \| \left( \frac{-1}{j\omega C_1} \right)$$ (6) Fig. 2. Broadband behavior of the optimum load impedance for a cascode device based on the large-signal load-pull simulation. Fig. 3. (a) A potential third-order realization of the desired output matching network. (b) The standard transformation from a normalized low-pass prototype to a bandpass network. where the "approximately equal to" is due to the Bode-Fano limit [12]. Comparing (5) and (6), the network high-lighted in Fig. 3(a) can potentially function as the desired matching network. The generic normalized third-order low-pass network prototype and its bandpass configuration are shown in Fig. 3(b) with the coefficients $g_1$ , $g_2$ , and $g_3$ set by the specific network type [9]. However, a high-power CMOS PA demands an output network with a large impedance transformation ratio and efficient power combining, which cannot be met by a generic $50\,\Omega$ doubly-terminated design. To satisfy these requirements, a transformer-based high-order output network is proposed, with its third-order implementation shown in Fig. 4. The proposed network incorporates a physical transformer with non-ideal magnetic coupling and finite winding inductances. A design methodology to convert a canonical 3rd-order bandpass network prototype (Fig. 3(b)) into the proposed matching configuration will be introduced in the following sub-sections. Design extension towards an Nth-order network is further presented in the Appendix. Fig. 4. The proposed transformer-based output matching network with its third-order implementation. Comparing our transformer-based approach with the design method only using Norton-transformation in [10], the latter provides a moderate impedance conversion ratio which is insufficient for high-power CMOS PA designs. Moreover, it will be shown in the following subsection that the impedance conversion by Norton-transformation directly links the network quality factor with the transformation ratio, thus incurring significant passive losses for designs with large conversion ratios. Furthermore, for a broadband single-ended PA implementation in [10], Fig. 5. Design procedures to convert a canonical third-order bandpass network to the proposed output matching network configuration. the 2nd order harmonic directly falls in band leading to its excessive harmonic leakage. These issues will be readily addressed by the proposed transformer-based output matching network topology. #### A. Proposed Output Matching Network Design Starting with a canonical third-order bandpass network (Fig. 3(b)), the design procedure to arrive at the proposed output matching network (Fig. 4) is presented as follows and demonstrated in Fig. 5. Step 1: The capacitor $C_3$ is first split into $C_{3a}$ and $C_{load}$ , with the latter representing the total parasitic capacitance from the pad and the inductor $L_3$ . The inductor $L_2$ is then split into $L_{2a}$ and $L'_{2}$ , with the following relationship: $$\frac{L_{2a}}{L_1} = \frac{1 - k_m^2}{k_m^2} \tag{7}$$ where $k_m$ is the magnetic coupling coefficient of the on-chip transformer to be used in the design. Step 2: Two Norton transformations are performed on the shunt-series inductors ( $L_1$ and $L_{2a}$ ) and the series-shunt capacitors ( $C_2$ and $C_{3a}$ ). Based on [15], the two transformation ratios can be calculated as $$n_c = \frac{C_{3a} + C_2}{C_2}$$ $$n_L = \frac{L_1 + L_{2a}}{L_1}.$$ (8) $$n_L = \frac{L_1 + L_{2a}}{L_1}. (9)$$ Since the capacitive Norton transform used here down-converts the impedance on its left, while the inductive Norton transform up-converts, the impedances for $C_1$ and $R_0$ are now both scaled by $n_L^2/n_C^2$ . Step 3: An ideal transformer with a turn ratio of $1:(n/k_m)$ is then inserted into the network. This results in further downscaling for all the impedances to the left of the transformer by a factor of $(n/k_m)^2$ to maintain impedance matching. If the following two equations are satisfied: $$L_2'' = L_{2a} \cdot \frac{n_L}{n_C^2} \cdot \left(\frac{k_m}{n}\right)^2 = \left(1 - k_m^2\right) L_{pr} \tag{10}$$ $$L_1'' = L_1 \cdot \frac{n_L}{n_C^2} \cdot \left(\frac{k_m}{n}\right)^2 = k_m^2 L_{pr}.$$ (11) The network highlighted in grey thus directly represents a nonideal transformer with both leakage inductances shifted to the primary side [14] and can be replaced by a physical transformer design with an actual turn ratio of n and a coupling coefficient of $k_m$ . The design equation of the transformer's primary inductance $L_{pr}$ is given as $$L_{pr} = L_1 \cdot \frac{n_L}{n_C^2} \cdot \left(\frac{1}{n}\right)^2. \tag{12}$$ Therefore, the proposed network achieves a total impedance down transformation of $(n/k_m)^2 n_c^2/n_L^2$ . Instead of trying to minimize or directly cancel the transformer's leakage inductances, the above design method naturally utilizes them in the matching network to achieve a third-order bandpass function with both impedance conversion and differential to single-ended signal combining. For a third-order bandpass PA output matching, the design space constraints on the realizable networks and trade-offs between design parameters are presented in the following subsections. ### B. Design Space Limitation Due to the Quality Factor of the Optimum Load The quality factor of the optimum load impedance presents the first design limitation on the achievable matching networks. For a given power device at a certain input drive, $R_{opt}$ is inversely proportional to the device current and thus the device width, while $C_{device}$ is directly proportional to the device width. Therefore, the product of $R_{opt}C_{device}$ , is to the first order independent of the device sizing, and is only determined by the process technology. The quality factor of the optimum load impedance at $\omega_0$ is given as $$Q_{load} = \omega_0 R_{opt} C_{device} \approx \omega_0 \frac{V_{max} - V_{knee}}{I_{max}} C_{device}$$ $$= \omega_0 \frac{V_{swing}}{I_W W} \cdot C_W W = \omega_0 \frac{V_{swing}}{I_W} C_W \tag{13}$$ where $V_{swing}$ is the maximum device output voltage swing. The quantities $I_W$ and $C_W$ are the saturation output current and the nonlinear output capacitance per unit device width, respectively. Moreover, $I_W$ can be further expressed in terms of the input drive $V_{\rm in}$ and the per unit width large-signal equivalent transconductance $Gm_W$ as $$I_W = Gm_W \cdot V_{\rm in} \tag{14}$$ where $V_{\rm in}$ ensures that the device outputs its saturation current and satisfies the reliability requirements. Then, a frequency quantity $f_{sat}$ can be defined for the given device technology, which associates the device output capacitance $C_W$ and its large-signal transconductance $Gm_W$ by $$f_{sat} = \frac{\omega_{sat}}{2\pi} = \frac{1}{2\pi} \cdot \frac{Gm_W}{C_W}.$$ (15) Note that this quantity $f_{sat}(\omega_{sat})$ benchmarks the device technology's speed at its large-signal operation with saturated output current. Based on (14) and (15), (13) can be simplified to $$Q_{load} = \frac{V_{swing}}{V_{in}} \cdot \frac{\omega_0}{\omega_{sat}}.$$ (16) This indicates that a faster device (a higher $\omega_{sat}$ ), a lower operating frequency $\omega_0$ , and a smaller allowed output voltage $V_{swing}$ lead to a lower quality factor of the optimum load, when the device delivers its maximum output power. On the other hand, for a generic third-order bandpass matching network [9] shown in Fig. 3(b), the loaded quality factor $Q_1$ for the first parallel section $(L_1 \& C_1)$ is $$Q_1 = \frac{Z_0/2}{\omega_0 L_1} = \frac{Z_0}{2} \omega_0 C_1 = \frac{Z_0/2}{\omega_0(\Delta Z_0/\omega_0 g_1)} = \frac{g_1}{2\Delta}$$ (17) where $\Delta$ is the fractional bandwidth $(\omega_H - \omega_L)/\omega_0$ with the center frequency $\omega_0$ as the geometric mean of $\omega_H$ and $\omega_L$ . Since the $g_1$ coefficient is from the canonical network prototype, it gives an indication of the intrinsic load quality factor for the specific network type at a given $\Delta$ . The left side of the matching network show in Fig. 3(b) is assumed to be connected to the device output. Because one can always add extra capacitance at the power device's output to accommodate a high Q matching network, the process technology and the center operating frequency thus determine the minimal load quality factor for the achievable broadband output matching network. This design space constraint, in terms of the network $g_1$ coefficient, can be derived as the following viability criterion: $$Q_{load} = \frac{V_{swing}}{V_{in}} \cdot \frac{\omega_0}{\omega_{sat}} \le 2Q_1 = \frac{g_1}{\Delta}.$$ (18) Note that impedance transformation does not change the quality factor of the network. Therefore, for a given device technology, output voltage swing limit, and the target network type, the maximum achievable bandwidth $(\omega_H - \omega_L)$ is given by $$\omega_H - \omega_L \le \omega_{sat} \cdot \frac{V_{\text{in}}}{V_{swing}} \cdot g_1.$$ (19) For a target network type (a fixed $g_1$ ), a faster process (a larger $\omega_{sat}$ ) directly enables implementing a larger absolute bandwidth ( $\omega_H - \omega_L$ ). Moreover, realizing a given fractional bandwidth $\Delta$ will become increasingly more difficult at a higher operating frequency $\omega_0$ , since this requires a large absolute bandwidth. Most importantly, although the device output voltage swing, i.e., $V_{swing}$ , can be increased by various design techniques such as cascode or breakdown voltage multiplier, it presents a direct trade-off with the maximum achievable matching bandwidth, posing challenges for PA designs targeting both high output power and broad bandwidth simultaneously. This design limitation is shown in Fig. 6 with a cascode stage implemented in a standard 90 nm CMOS process as an example. At a target absolute bandwidth of 7 GHz, this example can realize a third-order network with the smallest $g_1$ of 0.63, i.e., a Chebyshev network of 0.01 dB ripple. ### C. Design Space Limitation Due to the Finite Transformer Magnetic Coupling Besides the load quality factor, the practical transformer implementation also presents a design constraint on the broadband output matching network due to its non-ideal magnetic coupling. At the center frequency $\omega_0$ , the loaded quality factor $Q_2$ for the series section ( $L_2$ and $C_2$ ) in the bandpass network [Fig. 3(b)] is $$Q_2 = \frac{\omega_0 L_2}{2Z_0} = \frac{\omega_0 (Z_0 g_2 / \Delta \omega_0)}{2Z_0} = \frac{g_2}{2\Delta}.$$ (20) Based on (17) and (20), for a given fractional bandwidth $\Delta$ , the coefficients product of $g_1g_2$ is proportional to the loaded quality factors product of $Q_1Q_2$ . This product $g_1g_2$ also indicates the intrinsic quality factors for the network type at a given $\Delta$ . In the proposed output network design methodology (Section III-A), since $L_2'$ is non-negative (Fig. 5), $L_1$ and $L_2$ must satisfy the following condition as $$\frac{L_2}{L_1} = \frac{Z_0 g_2 / \Delta \omega_0}{\Delta Z_0 / \omega_0 g_1} = \frac{g_1 g_2}{\Delta^2} = 4Q_1 Q_2 \ge \frac{L_{2a}}{L_1}.$$ (21) Fig. 6. The design space constraint of a third-order network due to the quality factor of the optimum load for the example cascode device in a 90 nm CMOS technology. Based on (7) and (21), the network design-space limit due to the non-ideal magnetic coupling coefficient $k_m$ can be derived as $$g_1 g_2 \ge \frac{1 - k_m^2}{k_m^2} \Delta^2. \tag{22}$$ This is shown in Fig. 7(a) with an octave bandwidth $(f_{max}/f_{min} \text{ of 2 and } \Delta^2 \text{ of 1/2})$ as the design target. The curve $(1-k_m^2)\Delta^2/k_m^2$ thus sets the boundary for the achievable network types (indicated by the product $g_1g_2$ ) given the physical transformer implementation. In addition, any network design located on the curve of $(1 - k_m^2)\Delta^2/k_m^2$ leads to a zero $L'_2$ , which means the inductors $L_1$ and $L_2$ can be completely absorbed as the leakage inductances into the physical transformer after the impedance transformation. The different boundary curves for various target fractional bandwidths are shown in Fig. 7(b). With a $\Delta^2$ of 1/2, the network design space limitation by the finite magnetic coupling $k_m$ is further demonstrated in Fig. 7(c). For example, to realize a third-order Bessel network with an octave bandwidth, the transformer coupling coefficient $k_m$ must be greater than 0.65. Given a target fractional bandwidth $\Delta$ , to extend the achievable network designs to low Q network configurations, a high coupling coefficient $k_{\rm m}$ is required. Moreover, at a given $k_{\rm m}$ , a larger bandwidth design directly leads to a higher Q for the achievable network, resulting in more mismatch and passive loss. Therefore, a large transformer coupling coefficient is essential in realizing a broadband network, particularly for PA output matching. ### D. The Proposed Network's Passive Efficiency and Its Trade-Off With Bandwidth The previous two subsections demonstrate the practical design limitations due to the optimum load quality factor and the finite transformer coupling, constraining the realizable broadband PA output matching network designs. This subsection will investigate the theoretical passive efficiency of the proposed output network. The total impedance down-scaling ratio achieved by the proposed network is $(n/k_m)^2 n_c^2/n_L^2$ . Assuming $C_{load}$ is ignored, i.e., no splitting on the $C_3$ in the step 1 (Fig. 5), the maximum impedance transformation ratio r is then given as $$r = n_C^2 \cdot \frac{1}{n_L^2} \left(\frac{n}{k_m}\right)^2$$ $$= \left(\frac{C_3 + C_2}{C_2}\right)^2 \cdot \left[\left(\frac{L_1}{L_1 + L_{2a}}\right)^2 \left(\frac{n}{k_m}\right)^2\right]$$ $$= \left(1 + \frac{g_2 g_3}{\Delta^2}\right)^2 \cdot (nk_m)^2$$ $$= (1 + 4Q_2 Q_3)^2 \cdot (nk_m)^2. \tag{23}$$ This impedance transformation ratio r is the product of two factors. The first factor $(1+4Q_2Q_3)^2$ is the impedance down-scaling by the capacitive Norton transformation. This factor shows the challenge of realizing a broadband PA output network by only using Norton transformation, since it presents a direct trade-off between the impedance transformation ratio and the network quality factors. For a given network type (a fixed $g_2g_3$ ), a larger r requires a smaller fractional bandwidth, and therefore a larger $Q_2Q_3$ . This incurs more passive loss and dissipates the extra power generation gained from a larger impedance transformation ratio. The Power Enhancement Ratio (PER) defined in [13], which takes this passive loss into account and characterizes the actual output power boosting by the matching network, is given as $$PER = r \cdot \eta \tag{24}$$ where $\eta$ is the passive efficiency of the output matching. Fig. 8 demonstrates the simulation results for a third-order Chebyshev matching network ( $f_0=9~\mathrm{GHz}$ ) with all the inductors of an unloaded quality factor of 15. The trade-offs for PER, passive efficiency $\eta$ , and bandwidth, are clearly shown. On the other hand, the second factor $(nk_m)^2$ represents the impedance down-scaling by the transformer. Since the transformer operation is fundamentally independent of the network quality factors, it provides an extra degree of freedom for impedance transformation in addition to the capacitive Norton transform. Furthermore, the $(nk_m)^2$ indicates that the magnetic coupling coefficient $k_m$ plays a crucial role in the effective Fig. 7. (a) The limited design space of the third-order network for a given transformer coupling coefficient $k_m$ at an octave bandwidth. (b) The limited design spaces of the third-order network for different target bandwidth ratio. (c) Realizable network designs (g1g2) for a given $k_m$ at an octave bandwidth. Fig. 8. The maximum PER by the capacitive Norton transformation and the passive efficiency at $f_0 = 9$ GHz versus different bandwidth ratios in the proposed third-order output matching network. The output networks shown here are of Chebyshev configurations with 0.01 dB and 0.1 dB ripples. impedance transformation ratio by the transformer, since a larger $k_m$ directly enables a larger impedance transformation ratio r. This matches the widely-known transformer design intuition that prefers a high magnetic coupling. In a practical PA output matching design, $C_{load}$ is generally non-negligible and sometimes even comparable to $C_3$ (Fig. 5), which significantly compromises the down-scaling ratio by the capacitive Norton transformation. This makes the transformer's contribution more important in realizing a large impedance transformation for high power CMOS PA designs. #### E. Design Process Summary The design procedures to achieve the proposed transformerbased broadband PA output network can be summarized as follows. First, with a given process technology and the target absolute bandwidth $\omega_H - \omega_L$ , the design space limitation on the network coefficient $g_1$ can be calculated (Fig. 6). The practically achievable transformer coupling coefficient $k_m$ and the target fractional bandwidth $\Delta$ further limits the network design space in terms of the product $q_1q_2$ (Fig. 7). Next, the required PER can be derived from the target output power and the maximum output voltage swing of the process, which in turn determines the network type, given the target fractional bandwidth and the transformer specifications. A low Q network is generally preferred, which provides less load mismatches and a higher passive efficiency. The target network should be within the accessible design space set by the two calculated design constraints. If these design steps do not result in a feasible solution, the PA specifications have to be relaxed for a lower output power and/or a smaller bandwidth. Finally, with the output network configuration determined, its component values can be directly calculated based on Fig. 5. Optimizations may be required to finalize the design and accommodate additional parasitic effects. #### IV. A CMOS PA IMPLEMENTATION EXAMPLE In this section, a CMOS broadband PA will be presented as an implementation example [16]. The design goal is to achieve both narrowband communication with a programmable center Fig. 9. Complete PA circuit architecture with all the stages high-lighted. frequency [17] and signal amplification with a large instantaneous bandwidth [4]. The PA can be used for wideband signal amplification as well as ultra-wide band radar. The target bandwidth has $f_{max}/f_{min}$ ratio of greater than 2 (an octave) with a center tone at 9 GHz. #### A. The PA Architecture The PA architecture is shown in Fig. 9. Operating in a deep class-AB mode, the output stage is a pseudo-differential cascode to enhance the output power capability and reverse isolation. The driver stage operates in class-A mode to improve linearity. Both the output and the inter-stage networks implement third-order bandpass configurations. Termination resistors at both the PA's and the driver's inputs decrease their loaded quality factors for bandwidth extension and also reduce the non-linearity due to the bias-dependent $C_{gs}$ capacitances. Overall, the PA receives a balanced input and generates a single-ended output. This configuration is conducive to system integration with a differential on-chip driver and a single-ended off-chip antenna. Moreover, since all the supply and biasing voltages are fed from the center-tap of the inductors, no broadband chokes are required, which simplifies the design. #### B. The Output Matching Network The output matching network implements the proposed transformer-based third-order bandpass configuration based on the above design analysis and the octave bandwidth requirement. Although a higher order network can realize a larger bandwidth, besides demanding a large chip area, it incurs excessive passive loss and degrades the PA efficiency significantly. The output network is implemented mainly with two top metal layers (a 1.5 $\mu$ m aluminum layer and a 1.3 $\mu$ m copper layer). The 2:1 transformer coupling coefficient is approximately 0.7. After carefully selecting the network coefficients for a low Q network implementation, the inductors $L_1''$ and $L_2''$ in Fig. 5 are fully absorbed into the transformer with a zero $L'_2$ . Therefore, the third-order output network is realized by only two inductor footprints saving considerable chip area. The simulated primary inductance value is 350 pH, resulting in the effective leakage inductances $L_2''$ and $L_1''$ both around 175 pH. The network layout and the simulated differential load impedance (after absorbing the device output capacitances $C_{device}$ ) are shown in Fig. 10. Patterned ground shields are used for the transformer and the matching inductor to reduce substrate losses [18]. After impedance transformation by the output network, the real part of the differential load impedance is centered at 18 $\Omega$ and the series imaginary part is kept below 4 $\Omega$ to meet the desired optimum load value across a large bandwidth. The simulated total passive efficiency peaks at approximately 8 GHz with a value of 58.6%. The passive efficiency drops at higher frequencies mainly due to metal and substrate loss. #### C. The Inter-Stage Matching Network Since the class-A driver can be approximated as a current source, the inter-stage matching is designed to achieve a third-order constant transimpedance transfer function, which provides an equal driving power across the operation bandwidth (Fig. 11). However, with termination resistors at the PA input side for a low loaded Q, a generic double-balanced third-order bandpass network results in the same small impedances as Fig. 10. Simulated output matching network performance and layout. The $Z_L$ is defined as the load impedance after absorbing the device output capacitance (Fig. 5). Fig. 11. Inter-stage matching network. the driver's load. This causes the driver to operate in the current-limited regime and significantly compromises its output power and efficiency. To mitigate this issue, an inductive Norton transformation is performed to boost the load impedance at the driver side and increase the driver's output power. To further enhance the PA total bandwidth, the inter-stage matching network is designed to present a moderate driver gain peaking at around 6 GHz. #### V. MEASUREMENT RESULTS The PA is fully implemented in a standard 90 nm one-poly eight-metal (1P8M) CMOS process with a supply voltage of 2.8 V. Fig. 12 shows the chip microphotograph. Occupying a small core area of $0.45 \times 1.55 \ \text{mm}^2$ , the PA is conducive to further integration with additional transceiver circuits to form a complete broadband radio system on-chip. The CMOS PA chip is mounted on a brass substrate using silver epoxy for sufficient electrical ground contact and a good thermal sink (Fig. 12). This configuration is crucial to minimize the chip temperature during full power operation. In this section, the experimental results will be presented to characterize the PA's performance in its various operation modes. #### A. Small Signal Performance Because the PA has a differential input and a single-ended output, a full three-port S-parameter measurement is performed to characterize its small signal performance. The resulting differential-mode S-parameters are plotted in Fig. 13. The small signal gain S<sub>21</sub> peaks at 9.6 GHz at a value of 18.5 dB with a 3 dB bandwidth from 5.2 GHz to 13 GHz. This more than an octave bandwidth is achieved through both the broadband PA output matching network and the driver's gain peaking at around 6 GHz. The $S_{11}$ is better than -10 dB below 14 GHz, and the S<sub>22</sub> is lower than 0 dB across the entire measurement frequency range. With the cascode configuration for both driver and power stages, the overall reverse isolation is more than 53.4 dB at frequencies below 14 GHz. The group delay of the entire PA derived from the measured $S_{21}$ phase is also shown in Fig. 13, which achieves an average value of 161 ps with a maximum $\pm 20$ ps variation from 6.6 GHz to 16.5 GHz. This flat in-band Fig. 12. Chip microphotograph and photo of the PA testing module. Fig. 13. Measured differential-mode S-parameters and the group delay of the PA. gain together with the constant group delay indicates that the PA is capable of amplifying a truly broadband signal with little distortions. The stability factors are also derived from the S-parameter measurement to verify that the PA is unconditionally stable. #### B. Large Signal Performance The output power and PAE at both saturation and $-1~\mathrm{dB}$ compression modes are measured and plotted in Fig. 14. $P_{\mathrm{sat}}$ achieves a peak value of $+25.2~\mathrm{dBm}$ at 8 GHz with a PAE of 21.6%, and drops by 3 dB at 5.75 GHz and 13 GHz. To quantify the harmonic leakage at the -1 dB operating mode, both the 2nd and the 3rd harmonic contents at the PA's output are measured with respect to the fundamental tone and shown in Fig. 15. The 2nd harmonic is below -20 dBc and the 3rd harmonic is below -25 dBc without any off-chip filtering. Although the 2nd harmonic is generally significant for deep class-AB operation and falls in band for the frequency range of this PA design, it is largely attenuated through the commonmode rejection of the on-chip output transformer balun. The third-order harmonic is mainly filtered out by the inter-stage and the output matching networks. Fig. 14. Measured output power and PAE versus operating frequency. Fig. 15. Measured output 2nd and third-order harmonic levels at 1 dB compression point. #### C. Amplifying a NarrowBand Modulated Signal Next, the PA's performance is characterized with a narrow-band modulated signal, generated by a vector signal generator. An off-chip 180° hybrid coupler provides single-ended to differential conversion for the input signal. After down-conversion and low-pass filtering, the EVM of the PA's output is measured by the vector signal analyzer. Fig. 18. Measurement setup for BER and eye-diagram testing on broadband PRBS signals. Fig. 16. Measured QPSK and 16QAM EVM results versus input power at 8 GHz carrier frequency. $Fig.\ 17.\ \ Summary\ on\ measured\ EVM\ results\ for\ different\ operating\ frequency.$ The EVM results for QPSK and 16QAM modulations both at 5 Ms/s versus different input powers are shown in Fig. 16. The power gain remains flat versus the input power. At the +5 dBm input 1 dB compression point, the EVMs for a QPSK signal and for a 16QAM signal are 2.5% and 5%, respectively. This difference in EVM is mainly because the 16QAM signal is more susceptible to AM-AM and AM-PM distortion generated by the PA. Fig. 17 shows the EVM performance summary at the PA's 1 dB compression point across the band. The QPSK EVM and 16QAM EVM are below 3.6% and 6% respectively for the high frequency band (below 13 GHz). At the low frequency range, a smaller output power together with a lower EVM is possibly due to the saturation of the driver prior to the power stage. Other linearization techniques can be superimposed onto the proposed PA architecture to further improve the EVM [19]. These EVM results demonstrate that the PA is capable of transmitting a narrowband modulated signal with a programmable center-tone across a large frequency range, which is suitable for applications such as smart antennas or cognitive radio systems. #### D. Transmitting a BroadBand Signal To evaluate the PA's performance when amplifying a truly broadband signal, a bit-error rate (BER) based measurement setup is used (Fig. 18). A truly broadband pseudo-random-bit-sequence (PRBS) signal generated by a pulse pattern generator (Anritsu MP1763C) is low-pass filtered, up-converted to 9 GHz using an off-chip mixer, and then fed into the PA. The PA output signal is down-converted to the baseband using the same LO frequency and the corresponding BER is measured by an error detector (Anritsu MP1764C). The PA's output power is monitored simultaneously. This BER based setup directly characterizes the waveform distortion introduced by the broadband PA, and is preferred over the pulse based measurement approach in [20], since the former provides a measurement more sensitive to accurate timing and quantization resolution. The BER results versus the PRBS data rate are summarized in Fig. 19. Continuously monitored over a 14 hour period at +21.5 dBm output power for each measurement point, a BER better than $10^{-13}$ is achieved up-to a 7.5 Gb/s data rate. The data rate | | I | | 1 | | | | |---------------------|-----------------------|-------------------------------------------|------------|----------------------|---------------------------|----------------------| | Reference | Process | Technique | Frequency | Max P <sub>sat</sub> | Max PAE | Max P <sub>1dB</sub> | | This work | 90nm CMOS | Broadband<br>Load/ Push-Pull<br>/Class-AB | 5.2-13GHz | 25.2dBm | 21.6% | 22.6dBm | | J. Roderick<br>[20] | 130nm CMOS | Distributed<br>/Class-A | 0.6-2.8GHz | 21dBm | 16% (Drain<br>Efficiency) | 17dBm | | A. Vasylyev<br>[21] | 130nm CMOS | Push-Pull<br>/Class-AB | 20.5-31GHz | 13dBm | 13.2% | N.A. | | W. Bakalski<br>[22] | 350nm SiGe<br>Bipolar | Push-Pull<br>/Class-AB | 7-18GHz | 17.5dBm | 10.1% | N.A. | | A. Vasylyev [23] | 350nm SiGe<br>Bipolar | Push-Pull<br>/Class-AB | 1.5-2.9GHz | 32.3dBm | 30% | N.A. | TABLE I COMPARISON WITH PUBLISHED SILICON-BASED BROADBAND PA Fig. 19. Measured BER for different PRBS data rate and the measured eye-diagram at 5 Gb/s. can be potentially doubled by employing quadrature modulation. The eye-diagram of the down-converted signal is also observed using a digital oscilloscope as shown in Fig. 19. At a data rate of 5 Gb/s, the eye-diagram with the PA shows negligible degradation at an output power of +21.5 dBm and a PAE of 11.2%. The sine-wave shape of the eye is mainly because of the low-pass filtering effects of the up- and down- conversions. This good waveform fidelity is due to the flat in-band gain and group delay of the PA. The BER and eye-diagram measurements fully verify the PA's capability of amplifying truly broadband signals for applications, such as advanced RF imaging and biomedical sensing. A performance comparison with recently reported silicon-based broadband power amplifiers is listed in Table I. The presented PA in this paper has achieved the highest maximum $P_{\rm sat}$ and PAE with a state-of-the-art instantaneous bandwidth among all the CMOS PA designs. #### VI. CONCLUSION In this paper, a broadband PA topology with a transformer-based high-order output matching network is proposed to achieve the optimum load impedance across a large bandwidth for the maximum output power. A design method is presented to convert a canonical doubly-terminated bandpass network to the proposed output matching topology, which realizes both impedance conversion and differential to single-ended power combining. As a design example, a deep class-AB PA is implemented in a standard 90 nm CMOS process. Operating from 5.2 GHz to 13 GHz, the PA achieves maximum P<sub>sat</sub> of +25.2 dBm at a peak PAE of 21.6%. The EVM measurement results verify that the presented PA is capable of transmitting a narrowband modulated signal with a programmable center tone. Moreover, the PA's functionality of amplifying a truly broadband signal is confirmed by BER and eye-diagram measurements. # APPENDIX EXTENDING THE PROPOSED MATCHING NETWORK TO THE NTH-ORDER CONFIGURATION The proposed transformation method for a third-order bandpass matching network can be extended to a high-order configuration (Fig. 20). For the odd order case (N=2n+1), the total network conversion requires n inductive Norton transformations/transformer mappings and n capacitive Norton transformations. For the even order case (N=2n), the conversion demands n inductive Norton transformations/transformer mappings and n-1 capacitive Norton transformations. The total impedance transformation ratio r for both cases can be obtained as $$r_{Odd} = \prod_{i=1}^{n} \left( 1 + \frac{g_{2i}g_{2i+1}}{\Delta^2} \right)^2 \cdot \prod_{i=1}^{n} (n_i k_{m,i})^2$$ $$= \prod_{i=1}^{n} (1 + 4Q_{2i}Q_{2i+1})^2 \cdot \prod_{i=1}^{n} (n_i k_{m,i})^2$$ (A3) Fig. 20. Extending the proposed PA output network design methodology to an Nth-order configuration. $$r_{Even} = \prod_{i=1}^{n-1} \left( 1 + \frac{g_{2i}g_{2i+1}}{\Delta^2} \right)^2 \prod_{i=1}^n (n_i k_{m,i})^2$$ $$= \prod_{i=1}^{n-1} (1 + 4Q_{2i}Q_{2i+1})^2 \cdot \prod_{i=1}^n (n_i k_{m,i})^2 \quad (A4)$$ where $n_i$ and $k_{m,i}$ represent the turn-ratio and the magnetic coupling coefficient for the ith physical transformer design used in the network. The network design space constraints due to the finite magnetic coupling coefficient $k_{m,i}$ of the ith transformer and the power device's optimum load quality factor are given as $$\frac{g_{2i-1}g_{2i}}{\Delta^2} \ge \frac{1 - k_{m,i}^2}{k_{m,i}^2},\tag{A5}$$ $$\frac{g_1}{\Delta} \ge \frac{V_{swing}}{V_{in}} \cdot \frac{\omega_0}{\omega_{sat}}.$$ (A6) #### ACKNOWLEDGMENT The authors would like to thank Prof. A. Emami and Dr. S. Weinreb at Caltech for their technical advice. The authors also acknowledge Mr. Ta-Shun Chu at USC, Prof. Y. J. Wang at NCTU, Mr. S. Kousai at Toshiba, the members of Caltech CHIC group for their numerous suggestions, and Toshiba Corporation for chip fabrication. #### REFERENCES - [1] R. Janaswamy, Radiowave Propagation and Smart Antennas for Wireless Communications. New York: Springer, Nov. 2000. - [2] B. A. Fette, Cognitive Radio Technology. Boston, MA: Academic Press, Apr. 2009. - [3] F. F. Sabins, Remote Sensing: Principles and Interpretation, 3rd ed. Long Grove, IL: Waveland Press, Apr. 2007. - [4] B. Allen, M. Dohler, E. Okon, and W. Malik, *Ultra Wideband Antennas and Propagation for Communications, Radar and Imaging*. New York: Wiley, Dec. 2006. - [5] S. Cripps, RF Power Amplifiers for Wireless Communications, 2nd ed. Norwood, MA: Artech House, 2002. - [6] S. Cripps, Advanced Techniques in RF Power Amplifier Design. Norwood, MA: Artech House, 2002. - [7] B. S. Yarman and A. Fettweis, "Computer-aided double matching via parametric representation of Brune function," *IEEE Trans. Circuits Syst.*, vol. 37, no. 2, pp. 212–222, Feb. 1990. - [8] B. J. Minnis, Design Microwave Circuits by Exact Synthesis. Norwood, MA: Artech House, 1996. - [9] G. Matthaei, L. Young, and E. M. T. Jones, Microwave Filters, Impedance-Matching Networks, and Coupling Structures. Norwood, MA: Artech House, Nov. 1985. - [10] Y.-J. E. Chen, L.-Y. Yang, and W.-C. Yeh, "An integrated wideband power amplifier for cognitive radio," *IEEE Trans. Microwave Theory Tech.*, vol. 55, no. 10, pp. 2053–2058, Oct. 2007. - [11] R. Schaumann and M. E. Van Valkenburg, *Design of Analog Filters*. Oxford, U.K.: Oxford Univ. Press, 2001. - [12] D. M. Pozar, *Microwave Engineering*, 2nd ed. New York: Wiley, Jul. 1997 - [13] I. Aoki et al., "Distributed active transformer—A new power-combining and impedance-transformation technique," IEEE Trans. Microwave Theory Tech., vol. 50, no. 1, pp. 316–331, Jan. 2002. - [14] J. Long, "Monolithic transformers for silicon RF IC design," *IEEE J. Solid-State Circuits*, vol. 35, no. 9, pp. 1368–1382, Sep. 2000. - [15] L. Besser and R. Gilmore, Practical RF Circuits for Modern Wireless Systems, Vol. 1, Passive Circuits and Systems. Norwood, MA: Artech House, 2003. - [16] H. Wang, C. Sideris, and A. Hajimiri, "A 5.2-to-13 GHz class-AB CMOS power amplifier with a 25.2 dBm peak output power at 21.6% PAE," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2010, pp. 12–14. - [17] S. Jeon, Y.-J. Wang, H. Wang, F. Bohn, A. Natarajan, A. Babakhani, and A. Hajimiri, "A scalable 6-to-18 GHz concurrent dual-band quadbeam phased-array receiver in CMOS," *IEEE J. Solid-State Circuits*, vol. 43, no. 12, pp. 2660–2673, Dec. 2008. - [18] C. P. Yu and S. S. Wong, "On-chip spiral inductors with patterned ground shields for Si-based RF IC's," *IEEE J. Solid-State Circuits*, vol. 33, no. 5, pp. 743–752, May 1998. - [19] J. Kang, D. Yu, Y. Yang, and B. Kim, "Highly linear 0.18 μm CMOS power amplifier with deep n-well structure," *IEEE J. Solid-State Circuits*, vol. 41, no. 5, pp. 1073–1080, May 2006. - [20] J. Roderick and H. Hashemi, "A 0.13 μm CMOS power amplifier with ultra-wide instantaneous bandwidth for imaging applications," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2009, pp. 374–375. - [21] A. Vasylyev et al., "Ultra-broadband 20.5–31 GHz monolithically-in-tegrated CMOS power amplifier," Electronics Lett., vol. 41, no. 23, pp. 1281–1282, Nov. 2005. - [22] W. Bakalski et al., "A fully integrated 7–18 GHz power amplifier with on-chip output balun in 75 GHz-fT SiGe-bipolar," in Proc. IEEE Bipolar/BiCMOS Circuits and Technology Meeting, Sep. 2003, pp. 61–64. - [23] A. Vasylyev et al., "Fully-integrated 32 dBm, 1.5–2.9 GHz SiGe-bipolar power amplifier using power combining transformer," Electronics Lett., vol. 41, no. 16, pp. 35–36, Aug. 2005. **Hua Wang** received the B.S. degree from Tsinghua University, Beijing, China, in 2003, and the M.S. and Ph.D. degrees in electrical engineering from the California Institute of Technology, Pasadena, in 2007 and 2009, respectively. During the summer of 2004, he was an engineering intern with Guidant Corporation (later acquired by Boston Scientific), where he worked on posture monitoring systems for implantable devices. His current research interests are RF and millimeter-wave CMOS integrated circuits for communication and imaging systems, integrated bioelectronics and biosensors, and noise modeling in highprecision measurements. Dr. Wang was the recipient of Analog Devices Inc. Outstanding Student Designer Award in 2008, and the DAC/ISSCC Student Design Contest Winner in 2009 for his work on CMOS magnetic biosensors for Point-of-Care (PoC) microarray applications. Constantine Sideris (M'10) received the B.S. degree in electrical engineering with honors from the California Institute of Technology, Pasadena, in 2010. He was the recipient of a NSF Graduate Research Fellowship in 2010. He is currently pursuing the Ph.D. degree in electrical engineering at the California Institute of Technology. Ali Hajimiri (F'10) received the B.S. degree in electronics engineering from Sharif University of Technology, Tehran, Iran, and the M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, in 1996 and 1998, respectively. He was a Design Engineer with Philips Semiconductors, where he worked on a BiCMOS chipset for GSM and cellular units from 1993 to 1994. In 1995, he was with Sun Microsystems working on the UltraSPARC microprocessor's cache RAM design methodology. During the summer of 1997, he was with Lucent Technologies (Bell Labs), Murray Hill, NJ, where he investigated low-phase-noise integrated oscillators. In 1998, he joined the Faculty of the California Institute of Technology, Pasadena, where he is Thomas G. Myers Professor of Electrical Engineering and the director of Microelectronics Laboratory. His research interests are high-speed and RF integrated circuits for applications in sensors, biomedical devices, and communication systems. Dr. Hajimiri is the author of *The Design of Low Noise Oscillators* (Springer, 1999) and has authored or coauthored more than 100 refereed journal and conference technical articles. He holds more than 50 U.S. and European patents. He has served on the Technical Program Committee of the International Solid-State Circuits Conference (ISSCC), and as an Associate Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), an Associate Editor of IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS (TCAS) PART II, a member of the Technical Program Committees of the International Conference on Computer Aided Design (ICCAD), Guest Editor of the IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, and on the Guest Editorial Board of Transactions of Institute of Electronics, Information and Communication Engineers of Japan (IEICE). Dr. Hajimiri was selected to the top 100 innovators (TR100) list in 2004. He has served as a Distinguished Lecturer of the IEEE Solid-State and Microwave Societies. He is the recipient of Caltech's Graduate Students Council Teaching and Mentoring award as well as the Associated Students of Caltech Undergraduate Excellence in Teaching Award. He was the Gold medal winner of the National Physics Competition and the Bronze Medal winner of the 21st International Physics Olympiad, Groningen, Netherlands. He was a co-recipient of the IEEE Journal of Solid-State circuits Best Paper Award of 2004, the International Solid-State Circuits Conference (ISSCC) Jack Kilby Outstanding Paper Award, a two-time co-recipient of CICC best paper award, and a three-time winner of the IBM faculty partnership award as well as National Science Foundation CA-REER award and Okawa Foundation award. He co-founded Axiom Microdevices Inc. in 2002, whose fully-integrated CMOS PA has shipped more than one hundred million units, and was acquired by Skyworks Inc. in 2009.