This paper proposes a novel inter-stage load-pull characterization method to enhance the linearity of millimeter wave integrated power amplifiers (PAs) by minimizing their amplitude-to-phase (AM-PM) distortion without worsening their AM-AM or efficiency performances. The proposed method identifies the optimal solution for the inter-stage matching network which enables the synthesis of a driver stage AM-PM characteristic that is complementary to that of the power stage; consequently, reducing the overall AM-PM distortion of the PA. The proposed technique is applied to design a proof-of-concept 28-31 GHz PA demonstrator using 45 nm silicon-on-insulator CMOS technology. The measurement results obtained under continuous wave excitation at 29 GHz demonstrate an excellent AM-PM characteristic, with phase distortions as low as 0.2 • at the 1-dB compression power level of 13.9 dBm and less than 1 • at an output power level of up to 16 dBm (very close to the saturation power of 16.6 dBm). This enhanced AM-PM linearity improves the linearizability of the PA. This was confirmed by testing the PA with a 64 quadrature amplitude modulated test signal with an instantaneous bandwidth of 800 MHz. Without applying any digital pre-distortion (DPD) technique, the PA delivers a power added efficiency (PAE) of 8.7% at an average output power (P avg ) of 9.4 dBm while maintaining an error vector magnitude (EVM) of −25 dB. However, after applying a very simple memoryless DPD function with only four coefficients, the PA can operate at a higher P avg of 11.1 dBm with a much better PAE of 12.2% while still maintaining an acceptable EVM of −25.2 dB. Thanks to the proposed technique, the PAE of the proposed PA can be improved by 40% with a very simple application of a low cost and low complexity DPD technique.
I. INTRODUCTION
CMOS technology is highly favored for emerging 5G millimeter-wave (mm-wave) communication systems as it offers a high level of integration that enables low cost and high yield system-on-chip transceiver solutions [1] - [15] . However, the implementation of high performance CMOS mm-wave power amplifiers (PAs) that meet 5G system requirements is a challenging task. For instance, to maximize the data rate, 5G networks adopt high-order modulation schemes [e.g. 64-quadrature amplitude modulation (QAM)] that yield transmitting signals with high peak-to-average power ratios (PAPRs). To attain acceptable signal quality at The associate editor coordinating the review of this manuscript and approving it for publication was Yuh-Shyan Hwang. the PA's output when amplifying such signals, it is imperative for the PA to operate at a low power level that is largely backed-off from its peak power. This requirement significantly degrades the average efficiency of the PA and can contribute to lower battery life in user terminals or excessive heat dissipation in base station applications.
To improve the PA's energy efficiency, it is critical to reduce the power back-off required to attain suitable PA linearity. This means the PA designer must reduce the PA nonlinearity, typically characterized by non-flat amplitudeto-amplitude and amplitude-to-phase modulations (AM-AM; AM-PM). Although both of these mechanisms contribute to in-band and out-of-band distortions, recent investigations have demonstrated that high QAM signals are extremely sensitive to AM-PM distortions [11] , [12] . Moreover, as VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ identified in [16] , the PA's short-term memory effects manifest in the baseband AM-PM distortion. Hence, the amount of phase distortion in the PA is directly related to the amount of short-term memory effects it exhibits. This is commonly considered to be a serious obstacle to achieve high PA linearity. Hence, various techniques have been developed to reduce AM-PM distortion and improve the linearity of PAs. These techniques have mainly focused on renovating the architecture of a basic single-stage differential amplifier cell such that the nonlinearity of the single-stage cell is substantially reduced. Such topologies include the push-pull PMOS-NMOS differential amplifier architecture [11] , NMOS transistor cells with added parallel PMOS varactors [12] , and differential amplifier architectures that incorporate low-impedance 2 nd harmonic paths [13] or source degeneration inductors [14] , [15] . While the above methods developed for single-stage CMOS differential amplifier cells are effective and scalable, linearization techniques developed specifically for multi-stage CMOS PAs would also be of value given that in practice it is compulsory to cascade several single-stage cells to obtain a sufficiently high gain performance. Such techniques have been recently investigated for gallium nitride (GaN)-based monolithic microwave integrated circuit (MMIC) PAs. For instance, authors in [17] demonstrate that, in a two-stage GaN PA, the AM-PM of the power stage can be minimized by choosing a source impedance that is substantially mismatched to the conjugate of the transistor input impedance. Then, a high gain and high linearity driver stage is designed to compensate for this gain loss. This method improves the linearity of the power stage at the cost of a reduction its gain performance. However, since a two-stage PA's overall power added efficiency (PAE) performance is dominated by its power stage, this method substantially degrades the overall PAE. Consequently, it is not very attractive for CMOS PAs as their power efficiency is generally lower than that of GaN PAs [18] . In [19] , the AM-PM nonlinearity of a two-stage GaN MMIC Doherty amplifier is mitigated by exploiting the driver stage as a pre-distorter for the power stage. This is achieved through strategically designing an inter-stage matching network, one that enables the cancellation of the AM-PM components of the power and driver stages. Compared to [17] , this method improves the AM-PM linearity without significantly worsening the gain performance. Consequently, with its improved PA efficiency-linearity trade-off, this approach may be of more benefit to mm-wave multi-stage CMOS PA design.
However, the method proposed in [19] was developed exclusively for Doherty amplifiers. Since fixed load amplifiers and Doherty amplifiers have different sources of nonlinearity, this method would not be as effective for a typical multi-stage CMOS PA. For instance, in a Doherty amplifier the modulation of the drain load versus output power is the main contributor to AM-PM distortion. Hence, as demonstrated in [19] , by using a simple inter-stage matching network that exhibits minimal phase shifting, the driver stage can see a load modulation that is the reverse of the power stage's load modulation. This creates an AM-PM in the driver stage that is complementary to that of the power stage. However, in a fixed-load CMOS PA, the AM-PM distortion is mainly due to transistor nonlinearities [20] , such as the nonlinear parasitic capacitance, the Miller effect, and the second harmonic impedance. These mechanisms are all at play at large signal levels, making it extremely difficult to predict the optimal solution for an inter-stage matching network that yields the minimal overall AM-PM nonlinearity, if only following the linear circuit-based analysis method as adopted in [19] .
Since it is difficult to predict transistor behavior at large signal levels using merely linear circuit based approximation, the load-pull technique has been widely used to find a single-stage PA's optimal source or drain impedance that allows the transistor to efficiently/linearly operate at large signal levels. For instance, in [21] - [23] , novel load-pull algorithms are presented to search for the PA's optimum drain impedance and the parameters of its output matching network to optimize its efficiency and linearity performance. However, while authors in [21] - [23] exploit the load-pull concept to deduce the optimal design of the PA's output matching network, in this work we present a new inter-stage load-pull characterization method to search for the optimal solution of the inter-stage matching network of a two-stage mmwave CMOS amplifier-ultimately producing a two-stage PA with the best possible efficiency-linearity trade-off. While the previous load-pull methodologies only evaluated the performance of a single-stage PA, the proposed method allows us to characterize the overall performance of a two-stage PA. To do this, we use a simple and low-loss transformer (TF) as the inter-stage matching network of the two-stage PA. By judiciously varying the parameters of this inter-stage TF, multiple versions of the PA where their driver stage see different load impedance can be created; consequently, facilitating the evaluation of the PA performance versus its driver stage load impedance. The proposed method is able to produce two types of contours: (i) contours of the overall AM-PM, AM-AM and PAE of the two-stage PA; and (ii) contours of the constituent coupling coefficient and primary inductance of the inter-stage TF. The former contours characterize the PA's efficiency and linearity performances under various driver stage load impedances; the latter illustrate the practicality of these driver stage load impedances when realizing them on a high-loss substrate. The joint application of these contours helps to determine the optimal solution of an inter-stage matching network that is of good realizability and capable of producing the best possible trade-off between PA efficiency and linearity.
Below, Section II starts with an analysis that highlights the possibility of shaping the AM-PM characteristic of the driver stage through the selection of its load impedance. Section III describes the proposed inter-stage load-pull characterization method and applies this method to the design of a two-stage mm-wave CMOS PA with enhanced linearity-efficiency trade-off in a 45 nm silicon-on-insulator (SOI)-CMOS process. Section IV discusses the measurement results obtained while driving the PA demonstrator with continuous-wave (CW) and modulated signals.
II. ANALYSIS OF AM-PM DISTORTION IN CMOS PA
As identified in [20] , there are multiple sources of nonlinearity that can contribute to AM-PM distortions in a CMOS PA. Since this work focuses on controlling the driver stage's AM-PM by properly setting its drain impedance through the design of the inter-stage matching network, only those sources of nonlinearity that correlate with the transistor's drain impedance will be examined here. According to [20] , the major correlates are: (i) the feedback effect of the transistor's gate-drain capacitor and (ii) the nonlinear dependency of the transistor's drain current versus the drain and gate voltages. These sources are briefly reviewed below to gain insights into the mechanisms controlling the driver stage's AM-PM. Fig. 1 (a) depicts the schematic of a basic single-stage pseudo differential amplifier cell that includes two single-ended NMOS transistors (M N 1 and M N 2 ) and two neutralization capacitors (C N ). Fig. 1(b) shows the amplifier's half-circuit small-signal diagram including the transistor's trans-conductance (g m ); its gate-to-source (C gs ), gate-to-drain (C gd ) and drain-to-source (C ds ) parasitic capacitances; its gate-to-source (R gs ) and drain-to-source (R ds ) parasitic resistances and its differential source (Z S,diff ) and load (Z L,diff ) impedances. The inclusion of neutralization capacitors (C N ) leads to the total gate-to-source (C gs ), gate-to-drain (C gd ) and drain-to-source (C ds ) capacitances as illustrated in Fig. 1(b) . Furthermore, Z L,i denotes the impedance seen by the transistor at the intrinsic current source node. According to Fig.1(b) , the total capacitance at the input of the amplifier, C in , can be expressed as: C gs = C gs + 2C N (2)
(a) Extracted C gs , C gd , C ds and g m of single-ended 150µm NMOS transistor; (b) calculated C in versus V gs for three settings of Z L,i :
where C m denotes the Miller capacitance. Equations (1)-(3) reveal that the value of C in is potentially dependent on the value of the drain impedance Z L,i through the feedback effect of C gd . The degree of this dependency depends on the magnitude of C gd − C N . Hence, this dependency can be minimized through a proper selection of the value of C N . However, since the value of C gd in practice varies versus the gate-source voltage (V gs ), a perfect neutralization (e.g. C gd − C N = 0) can only be achieved for a certain level of V gs . For other levels of V gs , as the consequence of the non-zero C gd − C N , the value of C in becomes a function of Z L,i . An example of such dependency in the NMOS differential cell shown in Fig. 1 is summarized in Fig. 2 , where Fig. 2 (a) depicts the values of C gs , C gd , C ds and g m of the 150 um width single-ended NMOS transistor at various (V gs ). These values were extracted from the transistor's Y-Parameters by simulating its large signal model obtained from the GlobalFoundries 45 nm SOI-CMOS process. Then, these values were used to calculate the profiles of C in versus V gs based on (1)-(3) for three settings of Z L,i : Z L,i = 0/R opt /2R opt , where R opt denotes the transistor optimum load with its value found equal to 35 ohm for the 150 um NMOS transistor. In this calculation, the value of C N is set to 38.7 fF which allows for perfect neutralization at V gs = 0.3V to ensure the stability of the transistor at small signal levels. Fig. 2 (b) depicts the calculation results that confirm the impact of Z L,i on the nonlinear behavior of C in . Given that significant AM-PM distortions arise from the nonlinear behavior of C in [20] , it is possible to partially control the transistor AM-PM by deliberately setting Z L,i . Another important mechanism that links transistor AM-PM to its drain impedance takes effect through the coupling between the nonlinear I ds versus V gs and I ds versus V ds . This mechanism has been extensively studied in [20] where the authors modelled the transistor based on a sophisticated EKV model [24] instead of the oversimplified small signal model so that the nonlinear behavior of the transistor at large signal conditions can be analyzed. According to [20] , this mechanism can cause excessive AM-PM distortions at large signal power levels and the value of the generated AM-PM distortions depend on the reactive component of the transistor drain impedance. In practice, the observed AM-PM distortion of a CMOS amplifier is a composite result of multiple mechanisms. To demonstrate the dependency of the PA's AM-PM distortion on its Z L,i , a load-pull simulation was conducted on the differential cell shown in Fig. 1 (a) . Table 1 gives the the circuit parameter settings used in the simulation and Fig. 3 depicts the simulation results. According to Fig. 3(a) , at an input power of −2 dBm, the amplifier can produce various AM-PM distortions when loaded with different Z L,i . By varying the value of Z L,i from case 1 to case 5, the AM-PM distortions can be altered from −6 • to 6 • , without affecting the PA's output power. A more clear demonstration of this transition is depicted in Fig. 3(b) . These results confirm the possibility of controlling an amplifier's AM-PM by properly selecting its drain impedance.
III. THEORY AND IMPLEMENTATION OF PROPOSED INTER-STAGE LOAD-PULL CHARACTERIZATION METHOD
The previous section identifies the dependency of the PA's AM-PM distortion on its drain impedance. This dependency gives the possibility to improve the AM-PM linearity of a two-stage PA by judiciously designing an inter-stage matching network to produce a driver stage AM-PM characteristic that is complimentary to the power stage one. In this section, a new inter-stage load-pull characterization method is presented to search for the solution of the inter-stage matching network that minimizes the AM-PM distortion of the overall two-stage PA. For proof-of-concept demonstration, the proposed method is applied to the design of a two-stage CMOS PA in GlobalFoundries 45 nm SOI-CMOS technology at 28GHz to provide a target saturation power (P sat ) of 16 dBm with good efficiency and enhanced linearity. Fig. 4 shows the schematic and procedure used to implement the proposed inter-stage load-pull characterization methodology. According to Fig. 4 , before carrying out the proposed load-pull, the power and driver stage cells and their corresponding matching networks must be properly designed. This is described in Section III-A and -B. Then, the fundamental theory and the implementation procedure of the proposed inter-stage load-pull characterization method are given in Section III-C and -D.
A. POWER STAGE DESIGN
As suggested by the foundry, the nominal drain-to-source supply voltage (V DD ) of the NMOS transistor in the 45 nm SOI-CMOS technology is 1 V. However, an overdrive V DD of 1.2 V has been frequently reported in the literature [25] - [30] to improve the RF performance of the amplifier and no reliability issue has been reported. Given this, we selected the V DD of the power stage to be 1.2 V to achieve a higher P sat and PAE.
As depicted in Fig. 4(a) , the power stage active circuit is realized with two neutralized pseudo-differential NMOS cells. The width of each NMOS single-ended transistor was selected to be 150 um. This width was found to be sufficient for the power stage to produce an overall P sat of 16 dBm, under a Class AB bias with the gate supply voltage (V P G ) set to 0.3 V. This corresponds to a quiescent current density of 105 uA/um. Moreover, the neutralization capacitor [C N 1 , see Fig. 4 (a)] was selected to be 38.7 fF. Based on the load-pull simulation, the optimum impedance (Z opt,diff ) of the NMOS differential cell is equal to 2 * (15 + j21). To present this impedance to the NMOS differential cells as well as realize an efficient power combination, a two-way stacked-transformer (STF) parallel combiner [8] was adopted to realize the output combiner in the proposed PA. Fig. 5 depicts the circuit model and layout structure of the adopted output combiner. As can be seen in Fig. 5 , this combiner is realized with three vertically coupled coils. These coils are implemented in the three topmost metal layers. The top and bottom coils, implemented in the LB and OA layer, respectively, are connected to the two NMOS differential cells. The middle coil is implemented in the OB layer and connected to the RF output port. By using such an overlaying topology, this output combiner eliminates the need for additional lossy direct combining circuitries. Hence, it has a very compact size of 0.02 mm 2 and is capable to provide high passive efficiency. Fig. 6 shows the simulated S-Parameters of this combiner when its top and bottom coils are parallel connected VOLUME 8, 2020 to a differential termination with impedance equal to the conjugate of Z opt,diff /2. According to Fig. 6 , the adopted combiner achieves an excellent insertion loss of only 0.7dB and a good match with I/O return loss around or better than 15dB over 28-31GHz.
B. DRIVER STAGE DESIGN
In deep-submicron CMOS processes, the RF performance of PMOS transistors is very comparable to NMOS transistors [27] . In fact, authors in [28] , [29] demonstrated that PMOS-transistor-based PAs could achieve superior performance. Moreover, since the C gs versus V gs and C ds versus V ds in a PMOS are complementary to those in an NMOS, the PMOS device has the potential to be utilized like a natural ''analog pre-distorter'' for the NMOS device. Hence, in this work, we empirically choose the PMOS transistor to realize the driver stage. As depicted in Fig. 4(a) , a neutralized pseudo-differential PMOS cell was used as the driver stage active device. Its source-to-drain supply voltage (V SS ) was selected to be 1 V. As the trans-conductance of a PMOS is around 0.8 times that of an NMOS in 45 nm SOI-CMOS technology, the width of single-ended PMOS transistor was selected to be 180 um. Hence, the driver stage is capable to produce approximately one half of the drain current of the power stage. Moreover, the gate supply voltage of the driver stage (V D G ) was set to 0.7 V, yielding a quiescent current density of 51 uA/um. The value of the neutralization capacitor (C N 2 ) was selected to be 39.5 fF to compensate the C gd of the PMOS transistor.
Through simulation, the small signal value of the PMOS differential cell input impedance, Z in,diff , was found equal to 2 * (6−j47). The input matching network was then designed to match Z in,diff to the single-ended 50 ohm source impedance. Fig. 7 depicts the circuit model and layout structure of the designed input matching network where a capacitor (C 1 ) is parallel connected to the secondary coil of a single-todifferential TF (TF 1 ). To achieve the target impedance transformation, the value of the parallel capacitor was deduced equal to 10 fF and the primary inductance (L P1 ), coupling factor (k 1 ) and turn ratio (n 1 ) of TF 1 were determined to be equal to 204 pH, 0.54 and 1.6, respectively. Fig. 8 shows the simulated S-Parameters of the input network layout structure. According to Fig. 8 , the designed input network provides an insertion loss of about 1.8 dB and a good input and output return loss (better than 15 dB) around the targeted 28 GHz band. 
C. PROPOSED INTER-STAGE LOAD-PULL CHARACTERIZATION METHOD
The proposed load-pull characterization methodology targets evaluating the overall performance of the two-stage PA, not merely the driver stage or the power stage stand alone. Hence, prior to the proposed load-pull, a reasonable network synthesis approach must be developed such that the parameters of the inter-stage matching network can be fully determined from the variable that is being ''load-pulled''. Such a network synthesis approach is presented in this Sub-section. It utilizes a simple differential TF with primary inductance of L p , coupling factor of k m , quality factor of Q and turn ratio of n as the inter-stage matching network, which is highlighted by the red dashed box in Fig. 4(a) . During the proposed load-pull, the interstage TF needs to transform the input impedance of power stage, 2Z P gs,s.e. , to a wide variety of driver stage load impedances, 2Z D ds,s.e. [see Fig. 4(a) ]. This is achieved by varying the values of L p and k m of the inter-stage TF. Based on Fig. 4(a) , the Z-Parameter of the half circuit equivalence of the differential inter-stage TF can be expressed as:
X p = ωL p /2 (5) X m = ωL p k m n/2 (6) where X p and X m is the primary reactance and mutual reactance of this single-ended TF, respectively. This single-ended TF is intended to transform a given Z P gs,s.e. to a given Z D ds,s.e.
under given values of Q and n. Hence, X p and X m must satisfy the following equation:
Equation (7) can further translate to (8) and (9):
where R P gs and X P gs is the real and imaginary part of Z P gs,s.e. , respectively, and R D ds and X D ds is the real and imaginary part of Z D ds,s.e. , respectively. Dividing (8) by (9), we have:
In a practical design, Q is a finite number; consequently, (10) is a quadratic equation of X p . Based on (5)- (6) and (9)- (10), the solution of L p and k m for a given Z P gs,s.e. , Z D ds,s.e. , n and Q can be written as:
where T is an internal variable, expressed as:
D. APPLICATION OF THE PROPOSED INTER-STAGE LOAD-PULL CHARACTERIZATION METHOD
Based on the circuits designed in Sections III-A and -B and the equations derived in Section III-C, the proposed inter-stage load-pull characterization method can be performed. As per Fig. 4(b) , this method starts with setting the power stage input impedance 2Z P gs,s.e. , the inter-stage TF quality factor Q and turn ratio n, and the two output power levels, P out,H and P out,L . These are used in (14)-(15) to evaluate the two-point AM and PM distortions of the overall two-stage PA, dB(G err ) and G err , which are defined as the amplitude and phase difference between the overall gain at P out,H and P out,L , respectively.
G err ( • ) = G(P out = P out,H ) G(P out = P out,L )
To demonstrate the effectiveness of the proposed method in the high power region, P out,H was set equal to the target P sat of the PA, 16 dBm. In addition, P out,L was set to 9 dBm such that the difference between P out,H and P out,L is close to the typical PAPR value of commercial modulated signals. Moreover, to take account of the additional inductance due to the routes between the inter-stage TF and the two NMOS differential cells, the inter-stage TF turn ratio n was set equal to 1.1. Furthermore, Q was set to 15, a typical value obtained from simulating the thick metal layers in the 45 nm SOI process. Lastly, the value of Z P gs,s.e. was set equal to 2.5 − j23.7, which is the small signal value of the power stage input impedance deduced from post-layout simulation of the power stage and the output combiner.
Under the above settings, the inter-stage load-pull simulation was carried out for a reasonable range of Z D ds,s.e. with both R D ds and X D ds being swept from 5 to 50 at 28 GHz. For a given Z D ds,s.e. , L p and k m in Fig. 4(a) were set according to (11)-(13) so that the inter-stage TF can present twice the given Z D ds,s.e.
to the driver stage cell. It is worth noting that when both solutions of L m in equation (11) are positive, the solution with a lower magnitude is selected for lower loss. Once all parameters of the two-stage PA in Fig. 4(a) were determined, the PAE and gain of the PA versus its output power were simulated. Then, the two-point AM and PM distortions were computed using the values of the PA gain at P out,H and P out,L based on (14)- (15) . The above calculations and simulations were repeated for various Z D ds,s.e. , with the resulting PAE and two-point AM and PM distortions versus Z D ds,s.e. being used to identify the optimal value of Z D ds,s.e. and corresponding inter-stage TF parameters. Fig. 9 (a)-(c) illustrate the simulated two-point AM and PM distortion contours, evaluated between P out = 16dBm and P out = 9dBm, and the PAE contour at P out = 16dBm. The contours of the values of k m and L p used to synthesize the desired Z D ds,s.e. are plotted in Fig. 9(d) -(e). Fig. 9 (a) confirms the existence of multiple values of Z D ds,s.e. that yield a minimal AM-PM distortion of 0 • . As depicted in Fig. 9(b) -(e), these Z D ds,s.e. values require various TF designs and yield different amplitude distortion and PAE values. Therefore, the choice of Z D ds,s.e. must account for the PA linearity, the PAE value, and the practical realizability of the TF. Fig. 9 highlights the selected value of Z D ds,s.e. , found equal to 14 + j25. This impedance yields a 0 • two-point phase distortion, a low amplitude distortion of 2.7 dB as well as a good PAE (>38%). The TF design yielding this Z D ds,s.e. is considered as the optimal solution of the inter-stage matching network. It has reasonable values for the TF parameters, e.g. L p = 200pH, k m = 0.66, n = 1.1 and Q = 15, which are relatively easy to implement practically. These parameter values were used to guide the physical realization of the inter-stage matching network. As one can see from Fig. 9(d) , there are values of Z D ds,s.e. that require very high values of k m (higher than 1) that consequently cannot be realized with a simple TF. These impedances could be realized if additional elements such as shunt capacitances were allowed at the input/output port of the TF. However, in such cases, the AM-PM characteristic of their associated PA would need to be re-evaluated as the phase shift and quality factor of the inter-stage network can affect the PA's AM-PM characteristic [19] , [20] . Below, for the purpose of demonstrating the impact of Z D ds,s.e. , we will neglect the realizability issue of these Z D ds,s.e. and only investigate the RF performance of the PA. To do this, two two-stage PA designs with different values of Z D ds,s.e. were simulated. The first adopts the optimal TF parameters that yield the proposed Z D ds,s.e. (denoted as the proposed PA). The second design (denoted as the conventional PA) uses the L p and k m values that produce the conventional Z D ds,s.e. represented in Fig. 9 with the triangle symbol. This value of Z D ds,s.e. maximizes the overall gain and PAE of the two-stage PA. Consequently, it reflects the design choice of the driver load impedance in a conventional nonlinear multistage PA [31] . Table 2 summarizes the design parameters and key simulation results for the two PAs. Fig. 10 depicts their simulated PAE, and AM and PM distortions. According to Fig. 10 , the proposed PA delivers better AM-AM and AM-PM performances when compared to the conventional PA, with a slight sacrifice of PAE. In particular, the proposed PA achieves a 0 • of two-point PM distortion between P out = 16dBm and P out = 9dBm. This enhanced linearity is further examined in Fig. 10(b) , which also includes the AM-PM characteristics of the driver and power stages of the proposed PA. Based on Fig. 10(b) , it is clear that the driver stage produces an AM-PM response complementary to that of the power stage. This improves the overall AM-PM linearity of the proposed two-stage PA. Based on the identified optimal TF parameters, the interstage matching network was designed. Fig. 11 shows the circuit model and layout structure of the designed inter-stage network. According to Fig. 11(a) , the network consists of a differential TF (TF 2 ) and four series inductors (L c ) that account for the inductances of the routes between the TF and the NMOS differential cells. Through layout simulation, the value of L c was found equal to 37 pH. Then, the parameters of TF 2 were determined as L p2 = 190pH , k m2 = 0.65, n 2 = 0.98 so that the combination of TF 2 and L c could maximally emulate the behavior of the optimal inter-stage TF. To compare the designed inter-stage network to the optimal network requested by the proposed Z D ds,s.e. , the Z-Parameters of the layout structure in Fig. 11(b) was simulated using the configuration highlighted in Fig. 11(a) which absorbs the route inductances into TF 2 . Subsequently, these Z-Parameters were used to extract an equivalent version of the TF parameters of the overall designed inter-stage network. Fig. 12 summarizes these extracted equivalent TF parameters. Based on Fig. 12, at 28 GHz, the designed inter-stage network exhibits an equivalent primary inductance of 190 pH, a coupling factor of 0.59, a turn ratio of 1.06, and primary and secondary coil quality factors of 16 and 18, respectively. These values are very close to the optimal TF parameters requested by the proposed Z D ds,s.e. . Fig. 13 shows the simulated S-Parameters of the inter-stage network. According to Fig. 13 , the designed inter-stage network can realize a good I/O match with an insertion loss of 1.2 dB at 28 GHz. Of particular note, the simulated S11 is around −25 dB at 28 GHz, indicating the value of the load presented to the driver stage is very close to the optimal value found by the proposed load-pull. Finally, Fig. 14 summarizes the complete schematic of the proposed two-stage mm-wave PA and the input and load impedances of the NMOS and PMOS differential cells. 
IV. EXPERIMENTAL VALIDATION
The proposed two-stage mm-wave PA was fabricated in GlobalFoundries 45 nm SOI-CMOS process. Fig. 15 shows a micrograph of the fabricated PA chip that occupies a core area of 0.11 mm 2 . The chips were measured via on-wafer probing. Each DC pad is connected to decoupling capacitors with a combined value of 16 pF (32 x 0.5 pF), which is realized with vertical natural capacitor (VNCAP) each occupying an area of 130 um by 78 um. In the measurements, the source supply of the driver stage (V SS ) and the drain supply of the power stage (V DD ) were set to 1 V and 1.2 V, respectively. To achieve an optimal AM-PM characteristic, the gate supply voltages of the driver and power stages, V D G and V P G , were set to 0.73 V and 0.39 V, respectively. These bias conditions yield a quiescent current density of 50 uA/um and 113 uA/um, for the driver and power stages, respectively. These current values are close to the values adopted in the simulation.
A. CW MEASUREMENT RESULTS Fig. 16 shows the measured S-parameters of the fabricated chip obtained using a Keysight network analyzer N5247A. Based on Fig. 16 , the PA demonstrated a measured small-signal gain of about 20 dB at 29 GHz and a 3-dB bandwidth of 6 GHz (between 26 and 32 GHz). Both the input and output return losses (S11 and S22) were kept below −10 dB within 27.2-30.3 GHz. Excellent agreement was observed between the measurement and simulation results.
Large-signal measurements were taken at different carrier frequencies under CW stimulus and these results are summarized in Fig. 17 . In Fig. 17(a) , the measured DE, PAE and AM-AM characteristics are shown at carrier frequencies from 28 to 31 GHz. The best PAE result was obtained at 29 GHz with a measured peak PAE equal to 34.2%.
In terms of AM-AM, since the NMOS differential cells in the proposed PA do not incorporate any second harmonic control circuitry, the single-ended transistors see very large second harmonic impedances which could considerably degrade the PA AM-AM characteristic at high power levels [13] . Consequently, at 29 GHz, a large two-point amplitude distortion of −3.8 dB was observed between P out = 16dBm and P out = 9dBm. Although this measured two-point amplitude distortion is 1.1 dB stronger than the simulated result [−2.7 dB revealed in Fig.10(a) ], the measured P 1dB (13.9 dBm), is relatively close to the simulated value (14.3 dBm) . This indicates that the major discrepancy between the simulated and measured AM-AM is presented in the high power region (e.g. from P 1dB to P out = 16dBm) rather than at low to medium power levels. This can be attributed to issues such as transistor model inaccuracy at large signals and the EM simulator inaccuracies in predicting harmonic impedances and consequently their impact on the large signal behaviors.
Despite the relatively poor AM-AM characteristic, the proposed PA achieves an excellent AM-PM linearity over a broad frequency band. As highlighted in Fig. 17(b) , over 28-31 GHz, the PA maintained < 1 • AM-PM distortions at up to 16 dBm of output power-only slightly lower than P SAT (16.2 to 16.7 dBm). The best AM-PM linearity was attained at a carrier frequency of 30 GHz; only 0.2 • of AM-PM distortion was measured at its P 1dB (13.9 dBm) and this was maintained at < 1 • up to 16.3 dBm of output power. This result indicates the enhanced linearity and linearizability of the proposed PA. Large-signal performances across wider bandwidths are summarized in Fig. 17(c) . This figure shows a relatively broad bandwidth of 6 GHz (from 26 to 32 GHz) where the saturation PAE and output power are maintained at >30% and >16 dBm, respectively.
The sensitivity of the AM-PM linearity to the PA's gate bias conditions was also investigated. For this, the responses of the PA AM-PM distortion versus its P out under various settings of V D G and V P G were measured. For each response, the value of P out,H where its two-point phase distortion (with reference to P out,L = 9dBm) reached 1 • was extracted. Fig. 18 shows the contour plot generated from values of the extracted P out,H . According to Fig. 18 , the proposed PA allows the values of V D G and V P G to vary within ±40mV and ±30mV around the selected gate bias point while still attaining a targeted output power of 16 dBm with a two-point phase distortion of < 1 • .
B. MODULATED SIGNAL MEASUREMENT RESULTS
To assess the fabricated PA's performance under modulated signal stimuli, it was tested using a 64-QAM signal with an instantaneous bandwidth of 800 MHz and an input PAPR of 8.3 dB. In addition, a low cost and low complexity digital pre-distortion (DPD) technique was applied to the PA to examine its linearizability. The following two approaches were used to reduce the DPD cost and complexity: (i) a very simple memoryless odd-only polynomial function with a maximum order of 7 (only four coefficients) was used;
(ii) to further reduce the sampling speed required in the DPD observation path, the memoryless DPD function was first trained by driving the PA with a 10 MHz narrowband 64-QAM signal. The trained DPD function was then applied to the 800 MHz 64-QAM signal to study the potential improvement in PA performance due to its enhanced linearizability. Fig. 19 shows the block diagram of the DPD measurement setup. The input baseband I/Q signal was initially generated from a high speed arbitrary waveform generator (AWG) and subsequently up-converted to a carrier frequency of 30 GHz before being fed to the input of the PA. The output signal of the PA was firstly down-converted to an intermediate frequency (IF) at 2 GHz and digitized by a high speed oscilloscope. The digitalized signal was then processed in Matlab to train the DPD function and generate the pre-distorted input signal. Fig. 20 shows the measured average PAE, error vector magnitude (EVM) and adjacent channel leakage ratio (ACLR) before and after DPD. It is worth noting that the measured EVM here is normalized to the RMS value of the constellation point [32] . According to Fig. 20 , when DPD is not applied, the proposed PA can deliver an average PAE of 8.7% at an average output power (P avg ) of 9.4 dBm while maintaining an acceptable EVM [33] of −25 dB with an ACLR of −34dBc. Applying DPD will slightly back-off the value of P avg . Nevertheless, this does not affect the efficiency behavior of the proposed PA, as proven by Fig.20(a) . Moreover, as depicted in Fig. 20 (b)-(c), due to the improved linearizability of the proposed PA, its nonlinearity can be easily mitigated by using the low cost and low complexity memory-less DPD. Consequently, after the DPD, the PA can operate at a high P avg of 11.1 dBm with a good PAE of 12.2%, while still maintaining an acceptable EVM of −25.2dB. A close examination of the dynamic AM-AM and AM-PM distortions before and after DPD, and the output constellation after DPD at P avg = 11.1dBm are illustrated in Fig. 21 . According to Fig. 21(a) , the application of the memoryless DPD allows for an improvement of about 3 dB and 9.5 dB in the measured EVM and ACLR, respectively. Moreover, based on Fig. 21(b) , while the measured AM-PM response is very flat and does not need any correction, the AM-AM compression is effectively reduced from 7 dB to 2 dB after DPD. These results confirm the ease of linearization of the proposed PA, a feature attributed to the proposed inter-stage load-pull characterization method. Table 3 compares the performance of the proposed PA to other PAs operating at a similar frequency band reported in the literature. In terms of CW measurement results, it is clear that the proposed PA demonstrates the lowest AM-PM distortion. Moreover, while delivering a similar level of saturation power to other reported PAs, the proposed PA occupies the smallest physical area. These results demonstrate the effectiveness of the proposed method. In addition, this work is the only one that uses a very simple low cost and low complexity memoryless DPD technique to evaluate the PA's linearizability. Thanks to the enhanced linearizability offered by the proposed method, applying a memoryless DPD technique can increase the average PAE by 40%, e.g. from 8.7% to 12.2%, while maintaining a similar EVM (lower than −25 dB).
V. CONCLUSION
In this contribution, a new inter-stage load-pull technique is developed to search for the optimal solution of the inter-stage matching network to enhance AM-PM linearity of a two-stage mm-wave PA. This technique relies on judiciously selecting the parameters of the PA's inter-stage matching network to synthesize a driver-stage AM-PM characteristic that is complementary to that of the power stage. This technique was applied to the design of a proof-ofconcept 28-31 GHz PA demonstrator in 45 nm SOI-CMOS technology. The CW measurement results revealed an excellent AM-PM characteristic with maximum phase distortion lower than 1 • for output power levels of up to 16 dBm, over a broad band of 28-31 GHz. The linearity and linearizability enhancement of the proposed PA was further confirmed under a 64 QAM modulated signal with a modulation bandwidth of 800 MHz. For instance, at a P avg of 9.4 dBm, the PA demonstrator maintained an EVM of −25 dB, an ACLR of −34 dBc, and a PAE of 8.7% without applying any DPD technique. Moreover, after using a simple low cost and low complexity memoryless polynomial DPD function with only four coefficients, the proposed PA was able to deliver a similar EVM (−25.2 dB) and a better ACLR (−37 dBc) at a higher P avg (11.1 dBm) with a much better PAE (12.2%), corresponding to a 40% improvement over PAE values achieved without DPD.
