ABSTRACT A 55-65-GHz CMOS high power amplifier (PA) is designed with the help of a spirally folded 1:2 balun. By this size-folded balun, balance-unbalanced conversion and high-power combination are accomplished concurrently. Within the internal of PA, differential signal pairs benefit simplification of inter-stage matching topologies. The designed three-stage PA offers above 16.3-dB gain from 55.1 to 65.0 GHz. It is able to deliver 17.8-dBm output referred 1-dB compression point (P 1 dB ) and 22.2-dBm saturated output power (P sat ) with a peak power-added efficiency of 10.9%.
I. INTRODUCTION
Integration of microwave/millimeter-wave silicon PA compatible with digital and analog blocks has become a trend as continuous advancement of transistor scaling [1] - [4] . One hot research topic is high-performance CMOS power amplifier for the unlicensed band around 60 GHz [5] - [8] .
Although scaling gate length of CMOS transistor is desirable for f t /f max and PAE, it also lowers the maximum supply voltage [9] . This phenomenon would pose challenges to CMOS PA design because the output ac voltage swing is constrained accordingly. In semiconductor device level, the maximum supply voltage to PA could be increased by stacking-FET in CMOS SOI technology [8] , [10] . In PA architecture level, the method of improving power is to combine output signals from multiple transistor cells. By this way, Wilkinson combiner can demonstrate high output in CMOS [11] - [13] . A ticklish problem in Wilkinson PA design is to arrange both matching and bias feeding circuits together in single-ended circuit architecture. To add dc feeding to transistors, transmission line with one-fourth wave length is relatively bulky. Within such PAs, inter-stage multiple ways operate in common mode so that feeding path may also complicate matching topology and become a sensitive part
The associate editor coordinating the review of this manuscript and approving it for publication was Vittorio Camarchia.
to circuit performances. Compared to Wilkinson combiner PAs, transformer coupled ones [5] , [14] - [17] save area of additional bias circuits by using center taps. This type of PA generally requires single-ended to differential signal conversion for the external I/O. Although transformer may turn a signal into out-of-phase ones, this solution does not always work well when the out-of-phase signal pairs are not in a good condition of differential status [18] , [19] . If Balun is alternatively used to accomplish single-ended to differential conversion, classical configurations implemented by multiple layers routing or transmission lines with two quarter wave-length are bulky for CMOS power amplifiers [20] - [22] . Moreover, the physical I/O configurations of transformer and Balun are hard to place nearby large-size transistors for high power output. Recently, miniaturized Marchand Balun has been utilized in combining power from four cells in a W-band PA [23] . Multiple way combination solutions including Balun are still highly demanded to achieve high output power in CMOS.
In this paper, an internal differentially matched CMOS power amplifier architecture has been realized with the help of 1:2 spirally-folded Balun. The proposed Balun demonstrated in Fig. 1 achieves wideband balanced-unbalanced conversion and is designed for high power combination. Its work principle of folding Balun has been analyzed. By using this Balun in an eight-way combiner, 22.2 dBm Psat with 10.9 % PAE at 60 GHz is obtained in 65 nm low-power (LP) CMOS PA. This Balun also creates virtual ground points so that long dc-feeding lines and bypass capacitances can be eliminated in PA architecture. Then inter-stage matching can take the advantage of differential circuit like transformers so that a PA's schematic/layout is simplified. This paper is organized as follows. Section II details the PA overall architecture, inter-stage matching scheme and power distribution networks together with analysis and implementation of the spirally folded 1:2 Balun. In section III, the fabricated CMOS PA are demonstrated and measured. Conclusions are given in Section V finally.
II. CMOS PA DESIGN A. PA ARCHITECTURE
The proposed CMOS power amplifier architecture is illustrated in Fig. 2 . Basically, it employs three amplification stages and eight power units combined in the final stage. In the input port, a spirally folded 1:2 Balun splits unbalanced signal into two power amplification units with balanced phase status. Each power unit consists of a transistor pair along with neutralization capacitors. Then the output power of the first stage is further divided by current dividing. In this way, the adjacent power units of the succeeding stage are still with differential signals. The inter-stage matching is accomplished by transmission lines and capacitors. The dc injection point is also regarded as circuit common in matching and virtual ground makes dc feeding and matching parts insensitive to dimensions. The proposed inter-stage topology is also applied between the second and third amplification stages. In the final stage, output signals from eight power units are combined through current combining and the proposed spirally folded Balun.
B. INTER-STAGE MATCHING SCHEME
The inter-stage matching starts with investigating power units' impedance characteristics. Since the topologies of the two inter-stages are similar, matching schematic between the first and second stages is discussed as an example. In Fig. 3 . (a), components with the same signal polarity are extracted out to consider how to realize inter-stage matching in which dc injection point is regarded as virtual ground. By Smith Chart, the inter-stage matching is a flow of impedance transformation demonstrated in Fig. 3. (b) . As transistors with large gate width are employed to deliver high power, the optimal matching impedances of both the drive stage and the succeeding stage x are of low values in real and imaginary parts. More difficulty is two ways of the succeeding stage paralleled in the middle connection point, which would further lower their overall input impedance in |. To solve this problem, shorted stub TL 2 is firstly utilized to increase the real part to z. Then TL 3 is adopted for power division and physical connection. It is made of broad width transmission lines to reduce superfluous imaginary part. In the left of parallel connection point, a series capacitor brings the input impedance into weak capacitive region } so that the equivalent shorted stub TL 4 can be inserted to feed the drain bias of the drive stage. Finally, series transmission line TL 5 is adopted for adjusting matching impedance slightly. By this inter-stage matching scheme, the dc bias feed path is equivalent to only 9 • wave length and multiple ways inside PA keeps good uniformity.
C. POWER COMBINATION DESIGN WITH THE SPIRALLY FOLDED 1:2 BALUN
For this multiple way power combination using Balun and power distribution networks, design implementation can be accomplished by three steps. In the first step, an ideal circuit is constructed to plan the overall matching scheme as illustrated in Fig. 4 . In this topology, equivalent shorted stub TL c is used to shift the input impedance of power combining networks into inductive region and meanwhile inject part of dc current. With branches TL b paralleled in the inputs, a Balun itself can achieve power combination from four ways. In power combination view, the input balanced signals are combined in a series way. Due to large size transistor adopted for delivering high power, the impedance seen to an active device is 2.1-j * 5.1 so that the turn ratio of resistance part to 50 output is around 1:23.8. The impedance transformation is realized by the stated transmission lines and the spirally folded 1:2 Balun together. From TL a to spirally-folded balun and then TL b , input resistance is largely decreased to a low level.
In the second step, the spirally-folded 1:2 Balun is devised according to this matching scheme. Its work principle is illustrated in Fig. 5 . In this diagram, excitation is at the unbalanced port to a winding trace with four sections. Then differential signals are induced by two connected trace sections to the balanced ports. These coupled traces are ideally analyzed as lossless transmission lines. According to wave interference theory, the voltages and currents at the balanced ports are 
Then the input impedances seen from the two balanced ports are 
In the condition of unbalanced-balanced conversion is achieved, phase condition of θ 1 and θ 2 should be
When k equals to 0, θ is only π /4. Its according wave length of coupling traces is λ/8 for a folded 1:2 Balun. By the proposed folding scheme, Balun length could be reduced by half in comparison with that of classic ones.
Applying the proposed folded Balun concept in on-chip PA design, its trace sections are spirally winded along octagon shape for better integration compatibility. As demonstrated in Fig. 6 (a) , most of the winded sections are realized by M7-M8 while M6 is used to enhance conductivity at junction backside. The two balanced signals are excited by broad trace sharing one winding turn. In the middle of broad trace, dc feeding point is tapped from lower metal layers M4 and M5. The unbalanced signal is coupled out by a two turn octagon trace with total wave length approximating to λ/2. In Fig. 6 (b) and (c) , the simulated spirally-folded Balun demonstrates less than 5 • phase unbalances below 140 GHz and the insertion loss is about 1.3 dB at 60 GHz. Compared to 1:1 transformer with the fourth port connecting to the ground, the proposed spirally folded Balun can effectively make balanced-unbalanced conversion at a much wider frequency range. In 65nm LP CMOS process, the proposed Balun occupies an area of 160×140µm 2 , which is very suitable for high power PA design. If Marchand Balun basing on the same process is used alternatively, its dimension would be 905×12 µm 2 .
After the Balun is designed and preliminarily simulated, its configuration is inserted into Fig. 4 's circuit topology in the final step. The other ideal lumped elements are substituted by on-chip transmission lines and the whole networks including Balun are further optimized by full-wave EM simulation. This proposed Balun is also applied in power division circuits of the first stage. Its design method is similar to the output distribution networks. With the spirally-folded Baluns in both input and output, the internal signals between adjacent ways are in differential so that bias circuits basing upon transmission lines can be incorporated into matching part like transformers.
III. EXPERIMENT RESULTS
The proposed PA was fabricated in 65 nm low-power (LP) CMOS process which is for applications of low cost multi-media wireless and consumer electronics. Its f t and f max are around 210 GHz and 230 GHz respectively. The designed PA size is 1.20×0.95 mm 2 as illustrated in Fig. 7 . In the final VOLUME 7, 2019 stage layout design, two additional ground pads provide more return path of drain dc current. Top thick metals and bottom ground plane are connected to these ground pads for better thermal dissipation.
The small signal performance was characterized by Programmable Network analyzer (PNA) and Semiconductor Analyzer. The instrument calibration is accomplished by line-reflect-reflect-match (LRRM) method. The measurement results of the S-parameters are shown in Fig. 8. (a) . The fabricated PA can provide more than 13.3 dB gain at 55.1-65.0 GHz (3dB bandwidth) and the peak gain is 16.3 dB obtained at 60.1 GHz. The S 11 is below −18.9 dB from 55G to 65 GHz while the S 22 is less than −10 dB at 56.9-61.2 GHz. Basing on the results of S-parameters, the stability factor k and | | are then calculated by equation. As illustrated in Fig. 8 (b) , k is above 1.35 from DC to 67.5 GHz while | | is below 0.79. In large signal test, instruments including spectrum analyzer, semiconductor analyzer, amplifier module HHPAV-335 and vector network analyzer are used to deliver sufficient drive power to the design-under-test (DUT) as illustrated in Fig. 9. (a) of P sat and PAE are also measured from 53 GHz to 67 GHz, as shown in Fig. 9. (c) . This PA can offer above 19.5 dBm saturated output power at 53 -67 GHz and the PAE is above 4.9 % across this 14 GHz bandwidth. The measurement results of this PA are compared with state-of-the-art in Table 1 .
IV. CONCLUSION
In this paper, a high power amplifier utilizing novel spirally folded 1:2 Balun has been implemented in 65nm LP CMOS. The proposed Balun is able to achieve wideband unbalance-balance conversion and suitable for multiple way power combination networks. By this configuration, the internal of a PA can be treated as differential circuit so that virtual ground points are created to simplify matching and bias parts. Accordingly, a three-stage PA is demonstrated for high power amplifier design using large-size CMOS transistors. The fabricated PA provides more than 16.3 dB gain at 55.1-65.0 GHz and high output power of 22.2 dBm P sat with 10.9% peak PAE at 60 GHz. Moreover, the proposed Balun configuration and matching methods are applicable to other designs of silicon based high power amplifiers.
