Abstract: Series stacking of multiple devices in power amplifiers is a promising technique that has been explored recently at millimetre-wave frequencies to overcome some of the fundamental limitations of metal-oxide-semiconductor (CMOS) technology. Stacking multiple devices improve the output power and efficiency by increasing the achievable output voltage swing. Switching power amplifiers (PAs), such as Class-E PAs, are capable of high efficiency operation and can benefit from device stacking. This study presents a new topology for stacked Class-E-like PAs. In this technique, an appropriate Class-E load network is used for each stacked device, which imparts a true Class-E behaviour to all the devices in the stack. In addition, output power is available from multiple corresponding output nodes. The resulting topology is called the multi-output stacked Class-E PA. Two Q-band prototypes -a unit cell with two devices stacked, and a power-combined version employing two such unit cells -have been fabricated in IBM's 45 nm silicon-oninsulator CMOS technology using the 56 nm body-contacted devices. Measurements yield a peak PAE of 25.5% for the unit cell with saturated output power of 17.9 dBm, and a peak PAE >16% for the power-combined version with saturated output power >19.1 dBm. Owing to the proposed technique, the performance metrics are at par with the current state-of-the-art despite the higher ON-resistance and poor f max of the body-contacted devices.
Introduction
The limited breakdown voltage in fine-line metal-oxidesemiconductor (CMOS) and poor quality of on-chip passives pose challenges for the design of high-performance analogue, radio frequency (RF), and millimetre-wave (mmWave) components. The requirement of high-output power for long-range applications, such as satellite communication in the frequency band around 45 GHz, necessitates a high-power amplifier, in addition to energy efficiency. The power amplifier (PA) consequently emerged as one of the most challenging blocks and III-V compoundsemiconductor technologies have traditionally dominated the market for high-power applications.
Series stacking of multiple devices ( Fig. 1 ) is a promising technique that can help overcome some of the fundamental challenges associated with CMOS PA design. Stacking may be employed to increase the output power of a PA, as it increases the effective voltage swing at the load. Since the increased voltage stress can be shared by the various devices in the stack, the output voltage swing for a stack of n devices can be n times higher than that of a single device. For a given output power requirement, stacking improves the efficiency as the required load impedance transformation is eased or eliminated. Recent works involving device stacking for linear PAs (Fig. 1a ) [1] as well as switching-class PAs (Fig. 1b) [2] [3] [4] have demonstrated the feasibility of implementing efficient stacked PAs in CMOS at mmWave frequencies with high output power.
Switching PAs are theoretically capable of higher efficiency operation compared to their linear/quasi-linear counterparts owing to non-overlapping voltage and current waveforms and hence are used extensively at low RF frequencies. However, owing to the various non-idealities at mmWave frequencies, a 'switch-like' PA is more practical [2] . This work presents a new means of achieving appropriate voltage swing(s) at the intermediary node(s) for Class-E PAs employing device stacking. A 'Class-E load network' (which consists of a DC-feed inductor to the power supply in parallel with a series resonant filter connected to the appropriate Class-E load impedance) is connected at each intermediary node. The resulting topology is referred to as the multi-output stacked Class-E PA and amounts to stacking multiple single-device Class-E PAs while retaining their individual characteristics (Fig. 1c) . A unique feature of this scheme is that output power is available at all the intermediary nodes. Based on this idea, two Q-band prototypes have been fabricated in IBM's 45 nm SOI CMOS technology. The first is a unit cell with two devices stacked (referred to as the dual-output Class-E PA) where the output power available from the intermediary node is combined with that from the top drain node. The second prototype involves current combining two such unit cells to increase the overall output power.
Challenges associated with device stacking in class-E PAs
Fig. 1b depicts the concept of a stacked CMOS Class-E-like PA [2, 5] . A Class-E load network is connected to the topmost drain terminal, which shapes the voltage waveform to have a Class-E-like profile. For an n-stacked PA, a peak output swing of ≃ n × V reliability can be sustained, where V reliability is the peak voltage swing across any two terminals of a device for long-term reliable operation. V reliability is typically twice the nominal supply voltage for the technology [6] . This is depicted in Fig. 1b along with the appropriate intermediary node and gate swings. The swing at each gate is induced through capacitive coupling from the corresponding source and drain node via C gs and C gd , respectively, and is controlled through the gate capacitor C n .
A key requirement for true Class-E behaviour of the stack is for the intermediary drain nodes to sustain Class-E-like voltage swings with appropriately scaled amplitudes. This also ensures that the
IET Microwaves, Antennas & Propagation

Research Article
voltage stress is shared equally among all devices. The critical issue pertaining to appropriate intermediary voltage swings is illustrated by means of a 2-stacked topology in [5] . To preserve input power and improve PAE at mmWave frequencies where devices have poor gain, usually only the bottom device is driven in a stacked configuration [2, 4, 7] . Consequently, we rely on the voltage swing of the lower device(s) to turn off the device(s) higher up the stack. Once the stacked device turns off, the voltage of the intermediary node ceases to increase as the stacked device no longer conducts current to charge the parasitic capacitance at the intermediary node. This deviation of the intermediate node waveform from the desired voltage profile results in unequal voltage stress across the devices and deteriorates the overall efficiency owing to conduction loss during the initial period of the OFF half-cycle [8] .
This fundamental shortcoming of device stacking in Class E PAs has impelled research efforts to explore circuit techniques capable of mitigating this problem. The two most popular techniques are (i) the inductive tuning technique, namely placing a shunt inductor (L mid ) at the intermediary node(s) [6] and (ii) the charging acceleration technique [8] which utilises feed forward capacitive coupling. The inductive tuning approach suffers from several shortcomings at mmWave frequencies. Firstly, a series DC blocking capacitor is required, which will contribute loss owing to the poor quality factor of on-chip capacitors at mmWave frequencies. The tuning inductor can consume considerable die area, unless special design techniques, such as transformer-based charging acceleration [9] are utilised. In addition, the finite quality factor of the tuning inductor would contribute to power loss. Furthermore, the circuit is quite sensitive to the choice of the tuning inductor as discussed in [1] , where laser trimming was employed to optimise its value. The alternative charging-acceleration technique works well at low RF frequencies, but the poor quality and self-resonance of on-chip MIM/interdigitated capacitors used to implement the feed-forward capacitor would degrade efficiency at mmWave frequencies.
3 Multi-output stacked Class-E PA
Principle of operation
The proposed multi-output stacked Class-E PA topology is based on the key observation that the drain voltage profile in a Class-E PA is facilitated by the presence of the 'Class-E load network'. Extending this idea to the case where several devices are stacked, it is evident that incorporating an appropriately tuned Class-E load network at each intermediary node would result in Class-E-like voltage swings for all devices (Fig. 1c) Another important characteristic of the proposed topology is that output power is available from each intermediary node, which had formerly been used only to turn off devices higher up in the stack. The multiple output nodes can be power-combined internally to drive a single load with increased power or can be used to drive other circuit blocks, making the proposed topology useful as an active power splitter.
In order to facilitate theoretical analysis, we resort to the simplified schematic in Fig. 1d with the drain voltage swings for lossless operation annotated. The devices are represented by switches with output capacitance C out,i and corresponding ON-resistance R ON,i (i = 1, 2, …, n), each driven by a square wave input with 50% duty-cycle. The calculation of output capacitance C out,i at the ith intermediary node is not straightforward owing to the complex capacitive network formed by the device capacitances. However, one might use an approximate expression as follows:
and
where C d0,i , C gd,i , and C gs,i are, respectively, the drain-to-ground, gate-drain, and gate-source capacitances for the ith device. The above approximation is based on the observation that at the drain terminal of the ith device, in addition to C d0,i (which is the drain-to-ground capacitance), the capacitance seen looking up the stack is C gs,i+1 in the worst case (assuming that the externally added gate capacitor C g,i is relatively large). Similarly, the capacitance seen looking down the stack results in a worst-case value of C gd,i . The ideal operation of the multi-output Class-E topology can be understood by analysing the ON and OFF states in the absence of losses, i.e. R ON,i = 0 (i = 1, 2, …, n). As shown in Fig. 2a , in the ON state, the drain terminal of each switch is pulled down to ground, so that in effect, we have n independent Class-E PAs, each operating in its ON state. It should be noted that in this ON state, the switches that are lower in the stack must support the currents of the Class-E PAs higher in the stack, potentially increasing their conduction loss when finite conduction loss is (Fig. 2b) , each switch is 'open', so that once again we have n independent Class-E PAs operating in their respective OFF states. Consequently, we have true Class-E behaviour for the overall stacked topology. The individual Class-E load networks can thus be designed for global waveform shaping to optimise for output power and/or efficiency of the stacked configuration, as will be discussed subsequently. In a practical implementation, the finite switch loss will introduce interaction between the stacked devices in the ON state resulting in a deviation from independent Class-E behaviour. However, assuming low-loss operation and designing the Class-E load networks accordingly provides an excellent starting point for subsequent simulation-based optimisation.
Depending on the 'tuning' of the Class-E load network [10] of each stacked device, the corresponding supply voltage should be chosen so that the maximum instantaneous drain-source voltage swing of each device is V reliability [6] . As discussed in [10] , the waveform figure-of-merit
relates the peak drain voltage swing V peak to the DC supply voltage V DD of a single-device Class-E PA. The peak drain voltage swings in a stacked configuration will increase linearly with the number of devices stacked so that the total voltage stress is evenly distributed across the stack. The supply voltage V DD,i for the ith stacked device must be chosen accordingly, i.e.
Efficiency analysis
The presence of device conduction loss, modelled by the corresponding switch ON-resistance R ON , results in deviation from ideal operation of the proposed multi-output Class-E topology. A comprehensive analysis of optimal tuning for single-device Class-E PAs in the presence of significant conduction loss was discussed in [11] . Extension of this analysis to the multi-output Class-E PA is possible, but the equations are too complex to provide practical design guidelines due to the interaction between the switches in the presence of significant conduction loss (the switches lower in the stack support the current of those above). A simplified analysis is performed here to gain intuitive understanding of important factors affecting overall efficiency. Referring to Fig. 1b we use the notations I L,k and i k cos(ω 0 t + f k ) to denote the DC-feed inductor current and the load network current, respectively, for the kth switch in the stack. Let
The current through the kth switch during the ON half-cycle is then given by
Equation (5) culminates in some important observations. Firstly, a switch supports the load and DC-feed inductor currents of all the switches higher up in the stack, in addition to its own load network current. Thus, the bottom device supports the largest current and it is imperative to minimise it's ON-resistance to maximise efficiency. However, too large a device size would increase input power and degrade PAE, thereby resulting in a trade-off in device size. Secondly, the current flowing through the switches decreases up the stack so it is possible to taper the device size progressively. Finally, the different Class-E load networks (and consequently their currents) can be potentially tuned to shape the switch currents to further minimise conduction loss. The conduction loss of the kth switch is given by
where i s,k,RMS is the RMS current flowing through the kth switch. The drain efficiency can therefore be expressed as
where T s is the switching period, the ON half-cycle is assumed to be from t = 0 to t = T s /2 and I DC,k is the steady-state DC current drawn by the kth switch from its supply voltage V DD,k . In order to gain better insight into the design trade-offs, we resort to the waveform figures-of-merit as discussed in [10] 
where ω 0 is the operating frequency in rads/s. The metric F I,k is related to the shape of the current waveform and depends on the tuning of all devices above the kth device (unlike a single-device PA). Z C,k is a device-size dependent parameter while F C,k depends on the tuning of the kth device. Using these, (8) can be re-written as Since P DC,k = V DD,k × I DC,k , we can rewrite (11) as
Substituting in (12), we get
From the foregoing expression, it is clear that the overall efficiency is determined by the relative tunings of the Class-E load networks (represented by F I,k and F C,k values) and by the device-size tapering (represented by Z C,k values). Consequently, there are multiple optimisation variables which can be chosen to tailor the output powers from the individual load networks while ensuring the best possible efficiency. This possibility of global waveform engineering is in contrast to a single-device Class-E PA, where simply minimising F I and F C is desirable. The remaining of the paper, we shall focus on a special case of the multi-output Class-E PA with two devices stacked referred to as the dual-output Class-E PA. Switch-based simulations were conducted at 45 GHz based on theoretical results to observe the impact of relative device sizing and tuning of Class-E load networks for the dual-output Class-E PA. The width of the top device denoted by W 2 was fixed at 100 μm, while that of the bottom device (W 1 ) was varied along with the tunings of the respective Class-E load networks (given by
The tuning-dependent load impedances for each Class-E PA in the stack were determined based on the theoretically optimal load impedance that ensures zero voltage switching and zero derivative of voltage at switching under lossless operation [12] . The following parameters, obtained from device characterisation in IBM 45 nm SOI CMOS using body-contacted devices, were used for switch-based simulations:
The respective supply voltages were adjusted to ensure that the overall voltage stress is evenly distributed between the devices. Furthermore, ideal internal power combining was assumed in these simulations. From Fig. 3a , it is evident that for a given ratio of device sizes, there exists an optimal tuning ratio for the respective Class-E load networks that maximises drain efficiency. As expected, drain efficiency keeps improving with increasing size of the bottom device due to reduction in conduction loss, though the incremental benefits diminish when W 1 /W 2 ≥ 4. Furthermore, PAE is a more relevant metric at mmWave frequencies and device-based simulations are used to evaluate the impact of these trade-offs on PAE.
Device-based simulations were conducted (with lossless passives) using body-contacted devices at 45 GHz in 45 nm SOI CMOS as a function of device size ratio W 1 /W 2 with W 2 = 100 μm for the optimal tuning (i.e. tuning for highest PAE) in each case. Lossless power combining is assumed as before and the load impedances for simulations are determined as in [12] . The drain efficiency and PAE for device-based simulations are shown in Fig. 3b . The absolute value of drain efficiency differs from Fig. 3a owing to various non-idealities that are not accounted for in switch-based simulations. Fig. 3b shows that the PAE is maximum for a device size ratio ranging from 1:1 to 2:1. This is because input power increases for larger ratios. Fig. 3c compares the output powers generated by the top and bottom devices for both switch and device-based simulations. A good agreement is observed. Although PAE is practically the same (and maximum) for a device size ratio of 1:1 and 2:1, a sizing of 2:1 was chosen for the prototypes implemented in this work since the simulated output power was about 1.5 times higher. Fig. 3d depicts the load impedances for the top and bottom devices as a function of the device size ratio. Device size ratios higher than 2:1 were thus avoided owing to lower PAE (due to high input power requirements) as well as steep impedance transformation requirements that would further degrade the overall PAE.
The foregoing results provide design guidelines for a desired output power and the associated impedance transformation considerations. In the foregoing analysis, the loss in the passive components was not taken into account. Incorporating passive losses, even in a perturbative fashion, would make the theoretical analysis intractable and is best left to the simulation-based design/ optimisation stage. Nevertheless, the theoretical results provide a good starting point for simulations.
Internal power combining
As mentioned earlier, the multi-output topology can serve as a high-power high-efficiency active power-splitter with unequal division ratios that can be incorporated into the design procedure described earlier. As an alternative, the output powers available from the different intermediary nodes can be power-combined internally to drive a single load. In this work, we investigate internal power combining and the ensuing design challenges and trade-offs for the dual-output Class-E PA. The concept of internal power combining is illustrated in Fig. 4 and can be understood by traversing the figure from the left to the right. At each drain node, there is an optimal load network and corresponding output powers P out,1 and P out,2 for the bottom and top devices, respectively, (Fig. 4a) . The single load (chosen to be 50 Ω here) is split into two parts R A and R B for the bottom and top devices, respectively, in the inverse ratio of the respective output powers, i.e.
Impedance transformation networks M 1 and M 2 are then used to transform the optimal load impedances R 1 + jX 1 and R 2 + jX 2 to R A and R B , respectively, (Fig. 4b) . If the matching network loss is ignored, the amplitudes across R A and R B will be the same due to the choice of load resistances that are inversely related to the output powers. Equal phases can be ensured by choosing matching networks with similar topology and number of passive components. Another degree of freedom that helps in ensuring equal phases is the fact that the transformed impedances (earlier R A and R B ) can have parallel reactive parts so long as they cancel out on connecting the output nodes (or more precisely, add up to the pad capacitance). The relative output powers from the different Class-E load networks are another design degree of freedom that can be used to optimise efficiency and ease the design of the matching networks. In other words, the foregoing analysis imposes no restriction on the relative magnitudes of the output powers from the devices that are internally power-combined, so long as (18) and (19) are satisfied.
Stability of dual-output class-E PA versus cascode PA
Internally power combining the different output nodes of the multi-output topology results in several closed loops with active devices, for which stability must be ensured at the frequency of operation and at other frequencies. As shown in Fig. 5a for the dual-output Class-E PA, the matching networks M A and M B , together with the device M 2 form a closed loop which can give rise to oscillatory behaviour if the loop gain satisfies Barkhausen Fig. 4 Illustration of internal power combining for dual-output Class-E PA (biasing details omitted) a Optimised load networks for the top and bottom devices b Phase-shifts f 1 and f 2 introduced by the impedance transformation networks M 1 and M 2 , respectively, should ensure phase alignment at the transformed impedances R A and R B to ensure constructive power combining at the single output node c Single load=R A R B driven by output powers from the top and bottom devices. The single load is split between the individual load networks depending on the power levels prior to internal power combining such that equal voltage amplitude V 1 is produced across R A and R B Fig. 5 Dual-output Class-E PA, the matching networks MA and MB, together with the device M 2 form a closed loop a Feedback loop resulting from internal power combining in the dual-output Class-E PA and b Cascode PA where common-gate device mitigates feedback through C gd and improves reverse isolation criterion. This is unlike a cascode PA (Fig. 5b) where the common-gate device indeed helps to improve reverse isolation.
A small-signal analysis can be used to arrive at an expression for the gain of the loop resulting from internal power combining in the dual-output PA. As shown in Fig. 6a , the input source is removed and the bottom device is assumed to behave as a current source represented by its output capacitance C out,1 and output resistance R out,1 . The top device is modelled by its transconductance (g m ), output capacitance C out,2 , and output resistance R out,2 (Fig. 6b) . The load resistance R L has been ignored to determine the stability in the case of an open-circuit load, since the presence of a load generally improves the stability due to the loss it introduces. This results in the equivalent circuit shown in Fig. 6c . For this closed-loop system, one can derive
To ensure a stable design, the matching networks M A and M B should be chosen such that the following oscillation conditions are avoided: |loop gain| ≥ 1 and ∠(loop gain) = 0°. It should be noted that the resultant circuit is a Colpitts/Hartley-like oscillator, and startup is harder to meet when compared with a cross-coupled oscillator due to the voltage division involved in the feedback loop. In addition, the low available gain at mmWave frequencies and the biasing of the PA devices in weak inversion for Class-E operation (resulting in low g m ) help ease the stability problem of the dual-output PA to a large extent. Nevertheless, the choice of the matching networks is critical not only for constructive power combining, but also to ensure unconditional stability. We do not observe any signs of instability in our prototypes, as shown later in Section 5. Nevertheless, if potential instability is noticed at lower frequencies where devices have higher gain, frequency-selective loss networks (such as graded capacitor-resistor pairs) can be employed at the gate bias lines and also at the drain terminals (serving as supply bypass) [13] .
Dual-output Class-E PA implementation
This section explores the design of a dual-output Class-E PA unit cell and a power-combined PA employing two such unit cells. The unit cell PA, shown in Fig. 7a was designed for a saturated output power of ≃ 15 dBm (i.e. ≃ 30 mW). The load impedance was split equally between the load networks of the top and bottom devices in the PA implementations. Since the output amplitude is the same, both devices deliver equal output power (≃ 12 dBm) to the individual 100 Ω loads. In order to account for soft-switching at mmWave and poor quality factor of passive components employed in the impedance transformation networks, we assume a 3 dB design margin in output power. Utilising the design methodology described in Sections 3.2 and 3.3, for each pairing of device sizes the tuning of the individual Class-E load networks was varied to arrive at a global optimum for PAE, while ensuring that the top and bottom devices deliver equal output powers to their respective load networks. Finally, all the device sizes and associated component values were scaled so that each device delivers ≃ 15 dBm to its load network. It was found that PAE is optimised when the bottom device is twice as large as the top device and both the load networks have the same tuning
This corresponds to real load impedances of 76 and 27 Ω for the top and bottom devices, respectively. Intuitively, we would expect the bottom device to be larger than the top device and drive a smaller load impedance to deliver the same power with half the voltage swing. Fig. 7a illustrates the networks used for the top and bottom devices which transform the respective optimal load impedances of 76 and 27-100 Ω for power combining, while ensuring optimal phase and amplitude alignment at the final output node. In addition, the topology chosen for the impedance transformation networks can conveniently absorb the pad capacitance as well. The shunt transmission line used in the input matching network provides ESD protection without any performance penalty. A second PA prototype was implemented by current combining two dual-output unit cells (with larger device sizes, Fig. 7c ) to further enhance the output power, approaching ≃ 20 dBm on-chip. The impedances at pertinent nodes are marked on the circuit diagram, while the impedance transformation networks used for internal power combining are illustrated in Fig. 7d . Since the 50 Ω load is equally split between the two current-combined unit cells, the optimal load impedance for each device in the unit cell is transformed to 200 Ω for internal power combining. The increase in load impedance along with device sizes (by a factor of almost two compared to the unit cell) results in an impedance transformation that is four times steeper. Consequently, one can expect higher losses in the matching networks and hence lower efficiency from the power combined PA. Alternative techniques, Fig. 6 Stability analysis for the dual-output Class-E PA a PA without input stimulus b Simplified circuit for small-signal analysis, with the input device replaced by its output capacitance C out,1 and output resistance R out,1 and the top device modelled by its transconductance (g m ), output capacitance C out,2 , and output resistance R out,2 and c Equivalent circuit for calculation of loop gain such as transformer-based power combining can be exploited to boost the output power without sacrificing efficiency.
The non-overlapping nature of harmonic-rich switch-voltage and switch-current waveforms demarcates switching PAs from their linear counterparts. Since in a device-based implementation it is difficult to isolate the current flowing through the device capacitances from that flowing through the 'switch', the approach described in [5] is employed. The simulated drain-source voltage and switch-current waveforms for the dual-output PA unit cell is shown in Fig. 7b . The non-overlapping nature of the voltage and current waveforms along with their high harmonic content confirms switch-mode Class-E operation.
Power device modelling
The layout of the body-contacted power devices comprised a large continuous array of gate fingers. Two power device test structures with dimensions [(1.5 μm × 100)/56 nm] and [(3 μm × 100)/56 nm] Fig. 7 Unit cell PA a Dual-output Class-E PA unit cell schematic (left) and impedance transformation networks used for internally power combining the output power available from top and bottom devices (right). Impedance levels at pertinent nodes are annotated b Drain-source voltage and current waveforms for top and bottom devices exhibiting non-overlapping characteristics confirming Class-E-like operation (V gate,bot = 0.6 V, V gate,top = 1.8 V, V DD,bot = 1.3 V, and V DD,top = 2.8 V) c Current-combined dual-output Class-E PA schematic d Impedance transformation networks used for internally power combining the output power available from top and bottom devices. Impedance levels at pertinent nodes are annotated (used in the power-combined version of the dual-output Class-E PA fabricated in 45 nm SOI CMOS) were measured to have peak f max of 135 and 105 GHz, respectively. Usage of the available 40 nm floating-body devices and splitting the overall device into several smaller devices wired appropriately in parallel should improve the f max and hence the gain available from the device [14] and the performance of our prototypes.
Modelling of passive components
The inductances and transmission lines used in the prototypes have been implemented using CPWs in the topmost metal layer with a continuous ground plane underneath [5] . A 66 Ω CPW used extensively in the PA designs has a measured quality factor of ≃ 18 in the Q-band (33-50 GHz) [15] . The capacitors used in the PA designs have been implemented using interdigitated capacitors called vertical natural capacitors (VNCAPs). A W = 11.42 μm × L = 12 μm 280 fF VNCAP has a measured quality factor which ranges from 11 to 6 across the Q-band [15] .
Experimental results
The chip microphotographs of the two PAs are shown in Fig. 8 . The dual-output Class-E PA unit cell and the power-combined PA occupy 0.8 mm × 0.6 mm and 1.06 mm × 0.6 mm of die area (without pads), respectively.
Figs. 9a-d illustrate the simulated and measured small signal S-parameters of the dual-output stacked Class-E PA unit cell and the two-way current-combined PA implemented in 45 nm SOI CMOS. The measured peak gain of the dual-output unit cell PA is 9.8 dB at 46 GHz, with a −3 dB bandwidth extending from 41 to 57 GHz. The −1 dB bandwidth extends from 43 to 51 GHz, making it suitable for wideband applications. The measured peak gain of the power-combined PA is 8.2 dB at 51 GHz, with a −3 dB bandwidth extending from 45 to 57 GHz. The measured −1 dB bandwidth spans 48-54 GHz. As discussed in [5] , the PAs possess small-signal gain (despite being designed for Class-E operation under large input drives) since at the DC bias point the devices are biased in weak inversion. The μ stability factor for the prototypes, calculated using measured small-signal S-parameters, are depicted in Fig. 10b . Since the μ factor is always >1 throughout the measured frequency range, the PAs are unconditionally stable.
The large-signal measurement setup and performances of the fabricated prototypes are shown in Figs. 10a, c, and d, respectively. The large signal performance of both the unit cell and the power-combined PA were measured at 47.5 GHz, despite the fact that the small-signal gain of the latter peaks at ≃ 50 GHz (Fig. 9d) . Large-signal measurement beyond 47.5 GHz was limited by the characteristics of the measurement equipment (specifically, a Quinstar PA used to drive the PAs under test). Measurement results yield a peak PAE of 25.5% for the dual-output PA unit cell with a saturated output power of 17.9 dBm at 47.5 GHz, and a peak PAE of 16% for the power-combined PA with a saturated output power of 19.1 dBm at 47.5 GHz. Excellent agreement is observed between measurement and simulation as a consequence of the active and passive device modelling efforts. The current-combined PA achieves lower efficiency at 47.5 GHz when compared with the unit cell due to its steeper impedance transformations, larger power devices with lower f max , and 50 GHz centre frequency. It is worth mentioning that even though supply voltages greater than 1.25 V (which is the maximum recommended V DD in this technology) have been used in the prototypes, the actual V ds across the devices in DC is always ≃ 1.1 V owing to the voltage drop across the interconnect resistances. Furthermore, under large signal operation, bias values and input power are chosen to ensure that maximum voltage difference across any pair of terminals never exceeds 2V DD,max = 2 × 1.25 V for long-term reliable operation. The measured performance metrics of the two designs have been summarised and compared with state of the art mmWave CMOS PAs in Table 1 . The ITRS FoM, defined as ITRS FoM = P sat (dBm) + Gain(dB) + 10log 10 PAE + 20log 10 f 0 (21) takes into account four important performance metrics of a PA. In order incorporate technology limitations, the maximum oscillation frequency f max of the technology can be included as part of a modified FoM given by [28] FoM 1 = P sat (dBm) + Gain(dB) + 10log 10 PAE + 20log 10 f 0 / f max (22) Despite the use of passive networks for internal power combining (which are inevitably associated with impedance transformation losses) and the relatively low f max of the power devices in our prototypes (as a consequence of the usage of 56-nm body-contacted devices and a continuous array of gate fingers), the PAs achieve competitive performance in both ITRS FOM and FoM 1 , which points to the efficacy of the multi-output Class-E design methodology. The use of the 40 nm floating-body devices along with a better multiplicity-based device layout is thus expected to improve absolute performance. Furthermore, using the proposed topology as an active power-splitter would eliminate the passive combining networks and result in a high-efficiency power-splitter, a feature not afforded by works as in [5] with comparable output power.
Conclusion
A novel multi-output Class-E topology for stacked switching PAs is proposed. True Class-E behaviour for all the devices in the stack is achieved by using an appropriate Class-E load network for each stacked device. The output power available from multiple corresponding output nodes can be used for an active power-splitting or internally power-combined to implement a high power PA. Two Q-band switch-like PAs based on the special case with two devices stacked were implemented in IBM's 45 nm SOI CMOS technology employing the body-contacted devices with 56 nm channel length. Design of minimum-loss matching networks that optimally distribute the output power at the various intermediary nodes, along with potential applications of the multiple outputs for power distribution in an integrated application constitute interesting topics for future investigation.
