Abstract-Distributed amplifiers (DAs) feature large bandwidth but relatively low gain and power efficiency. This paper presents a supply-scaling technique to improve the efficiency of a mm-wave DA while maintaining a broadband 50 match. An analysis of interstage load modulation and the effects of shunt dc-feed inductors on distributed operation is provided. A single-ended, eight-stage DA is designed in a 90 nm SiGe BiCMOS process. The fabricated amplifier has a gain of 12 dB over a 3 dB bandwidth from 14-105 GHz. The measured peak output power is 17 dBm with a peak power-added efficiency (PAE) of 12.6% at 50 GHz and 3 dB power bandwidth greater than 70 GHz. The DA occupies an area of 2.65 mm × 0.57 mm, and total dc power consumed from four scaling voltage supplies is 297 mW.
I. INTRODUCTION

E
FFICIENT utilization of millimeter-wave (mm-wave) bands will emphasize amplification across several frequency bands in a single amplifier. Ultra-wideband amplifiers have applications in high-speed data links, broadband transceivers, high-frequency instrumentation, and high-resolution imaging. For instance, fine spatial resolution in imaging systems requires narrow pulses to produce a wide range of frequency content [1] . Amplification of a 10 ps Gaussian pulse for sub-millimeter accuracy requires bandwidth on the order of 100 GHz. Conventional tuned amplifiers have difficulties satisfying such large bandwidth requirements due to their inherent gain-bandwidth tradeoff. On the other hand, distributed amplifiers (DAs) provide an effective solution with their large bandwidth, and low gain variation and sensitivity to mismatch [2] .
While silicon technology scaling has improved transistor cutoff frequency f T to the hundreds of GHz, transistor scaling does not tend to improve the efficiency or the output power of distributed amplifiers. If silicon-based processes can supplant III-V technologies in mm-wave systems, higher reliability and yield of silicon must be leveraged against the lower intrinsic gain and breakdown of CMOS/BiCMOS processes. A number of DAs with bandwidths in excess of 80 GHz have been demonstrated in silicon [3] - [8] . However, conventional DAs suffer from poor power efficiency, making these designs unattractive for broadband power amplification. To address the efficiency issues, previous attempts at DA scaling have been realized by impedance tapering of the loaded collector-line and scaling of the gain stage device sizes [9] , [10] . Unfortunately, this incurs greater resistive line losses and high-frequency reflections due to impedance mismatch, degrading the gain as well as limiting the number of stages that can be implemented. Therefore, the design of distributed amplifiers for high output power and efficiency over wide bandwidth remains an open challenge.
This paper presents a supply-scaled distributed amplifier that offers improved collector efficiency (CE) and power-added efficiency (PAE). The analysis investigates load modulation at each stage within the distributed amplifier. An analysis of supply scaling indicates how this technique performs load pulling analogous to impedance tapering but does not incur the same passive losses or frequency dependency. By feeding separate dc supply voltages through high-pass constant-k filter sections, improved power efficiency is achieved while maintaining a constant 50 line impedance within the amplifier bandwidth. An 8-stage supply-scaled DA is demonstrated in a 90-nm SiGe BiCMOS process. This work expands upon the design presented in [11] to detail the analysis of interstage load modulation due to traveling waves and the design methodology of the band-pass DA. New measurements of the DA with uniform biasing are also presented to verify the supply-scaling theory. Section II presents an overview of the limitations of conventional DA designs and tapered lines. Section III introduces the concept of supply-scaling and discusses its advantages over impedance tapering techniques. The design and analysis of a band-pass DA to enable independent supply biasing is presented in Section IV. Measurements of the fabricated DA and comparison with previous works are included in Section V. Fig. 1 shows the schematic of a conventional uniform DA. Distributed amplifiers constructively add the output current from each gain stage in the collector transmission line as the RF input signal travels along the base line. Neglecting losses, the DA exhibits gain that linearly increases with the number of 0018-9200 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
II. DISTRIBUTED AMPLIFIER EFFICIENCY LIMITATIONS
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. stages while maintaining bandwidth in contrast to cascading amplifier stages. In a DA, transistor parasitic capacitances are absorbed into the input and output lines to create lumped-element T -section constant-k filters. The cascade of T -sections forms an artificial transmission line whose cutoff frequency determines the bandwidth of the DA [12] . For transmission line segments of length l seg /2, with inductance-per-length L tl and capacitanceper-length C tl , to each side of the gain stage, loaded by parasitic capacitance C par , the T -section characteristic impedance Z 0,l and low-pass cutoff frequency f c,l are given by
While DAs achieve large gain-bandwidth (GBW) product, conventional topologies exhibit poor power efficiency due to a number of factors. Since the collector of each transistor sees the same impedance in both directions, half of the collector current from each stage travels towards the reverse termination (these reverse currents generally do not cancel, and the power is lost). Secondly, the wideband nature of the amplifier prohibits harmonic tuning of transistor outputs, preventing waveform engineering for higher-efficiency classes of amplifier operation. Finally, the voltage swing at each stage is not uniform, with later stages having a larger swing due to voltage summing along the collector transmission line. Since the dc collector bias is shared amongst all stages, this results in a large amount of wasted headroom. The inefficiency is evident from Fig. 1 , where the impedance seen at the collector of the nth stage in an N-stage DA with amplitude of v n and current swing of i C,n is
where
The forward traveling voltage wave v F,n is due to the current of the preceding stages (zero for the first stage), while v R,n is the reverse wave from subsequent stages (zero for the last stage), and v C,n is the voltage induced by the transistor. The latter is determined from the small-signal impedance seen by collector n looking toward the reverse termination (R R ),
and toward the load (R L ),
where Z 0,n , θ n describe the characteristic impedance and electrical length of the transmission line section to the left of stage n, and Z R,0 , Z F,N+1 equal R R and R L , respectively. In the case of a single stage amplifier, the impedance seen at the collector is not affected by traveling waves and Z C,n = v C,n /i C,n , which would be recognized as the impedance to optimally match the transistor for the transfer of power into a load. For the general case of an N-stage DA, however, this is not true.
A. Uniform Distributed Amplifier
For a uniform DA, Z F,n = Z R,n = Z 0 and θ B,n = θ C,n = θ . Therefore, the impedance seen at the collector simplifies to When each device contributes the same current, i.e. i C,k = i C , this impedance is further simplified to
In this case, the impedance seen by the collector has a linearly increasing real component, as well as a complex component that leads to frequency dependent amplitude and phase variation. Fig. 2 shows the variation in Z C,n with respect to electrical length θ , and Fig. 3 shows collector voltages v n . It can be seen that all but the final gain stage in a uniform DA have output voltages and impedances that change periodically with frequency. The preceding theory has been corrobrated by electro-optic measurements on signal propagation internal to distributed amplifiers [13] . More generally, when each stage of the DA is not uniform, the nth transistor not only sees frequency dependent load modulation due to v F,n and v R,n from (3), but also frequency-varying Z F,n and Z R,n , which impact v C,n as described in (4c). The ability to control the impedance at each collector forms the basis of loadline modulation.
B. Optimal Loadline Matching
In a conventional uniform DA, the collector of each transistor is fixed to V C,n = V CC and the loadline impedance is
where V K is the knee voltage of the technology and I C is the dc bias current (constant across stages). For a uniform DA, this loadline impedance is always larger than or equal to the optimum impedance seen in (3), since the reverse traveling voltages do not always add constructively. For class-A operation, the maximum amplitudes for the voltage and current are v n = V CC − V K and i C,n = I C . We desire to set the collector impedance according to the loadline impedance:
However, it is obvious that the voltage swing at each transistor is different even as the current through each stage is fixed. This leads to a non-optimal loadline matching for uniform DAs. One solution to the DA efficiency problem is impedance tapering along the output transmission line as proposed by [2] . Using this approach, collector line impedances and lengths are set such that v R,n (or i R,n ) = 0 and all the generated power travels to the RF output, circumventing the need for a reverse termination. The tapered line load pulls each transistor n to see Z opt,n = (V CC − V K )/I C,n with a constant voltage swing. A number of impedance-tapered DAs have been demonstrated in silicon and III-V processes with efficiency gains [14] , [15] . However, due to the frequency dependency of the load pulling mechanism, these designs can only achieve narrowband efficiency enhancement through careful optimization of unequallength sections. Additionally, the high load impedances at early stages require narrow-width transmission lines that are lossy and difficult to synthesize, even with III-V back-end-ofline (BEOL) processes [9] .
Recently, efforts have been made in [10] to design an enhanced-efficiency DA with over 100 GHz bandwidth, utilizing simultaneous scaling of device size with output line impedance. The low degree of tapering, however, dictates the need for an explicit reverse termination (Z R,1 (ω) = ∞ as in ideal impedance tapering) and sacrifices the perfect cancellation of reflected waves. The resulting mismatches along the output line, combined with losses from the highimpedance early stages, limit the output power and overall PAE, especially at higher frequencies. It is evident that the inability to synthesize a large range of transmission line impedances with low loss is a major detriment to attempts at efficiency improvement using these techniques.
III. EFFICIENCY ENHANCEMENT THROUGH SUPPLY SCALING
To avoid tapered transmission lines, we propose a supplyscaling technique for enhancing DA efficiency while maintaining a constant 50 characteristic impedance along the synthesized collector line. From Fig. 3 , it can be seen that the voltage at successive collectors increases along the output line monotonically, and more accurately, the average voltage increases linearly inside the amplifier pass band. This feature contrasts with a stage-scaled DA (shown in Fig. 4 with transmission line scaling of 1.1 and current scaling of 0.96), which exhibits larger variation in voltage and impedance with respect to frequency. By independently setting the dc collector voltages V C,n to match the maximum v n in each section of a standard DA, we eliminate the wasted headroom present at each but the last stage. This approach effectively moves the loadline of each transistor to an optimal point for dc power consumption without requiring any change in the passive component parameters from stage to stage:
While supply-scaling performs load modulation analogous to a tapered line (Fig. 5) , it offers a number of advantages for wideband operation. Not only are high-impedance transmission lines avoided, but the sensitivity of efficiencyenhanced operation to frequency and process variation is lower compared to that of impedance tapering as well.
Looking at the Nth collector in Fig. 3 , the peak output power for an ideal lossless non-tapered DA operating under class-A bias is constant across all frequencies and given by
The dc power consumed per stage is
In a uniform DA (i.e. V C,n = V C,N ), the theoretical collector efficiency is therefore C E = P out /N P DC,n = 25%, half that of a conventional class-A amplifier.
On the other hand, if the supply voltages are scaled such that V C,n = n(V C,N − V K )/N, the maximum voltage swing at each stage within the pass band, the collector efficiency becomes As shown in Fig. 6 , the theoretical efficiency of a supplyscaled DA approaches 50% as N becomes large. In reality, a number of factors prevent maximum efficiency operation, including collector line losses and nonzero knee voltage. Providing individual dc supplies to each gain stage may prove to be impractical for real systems as well.
IV. DESIGN METHODOLOGY OF BAND-PASS DA
Conventional DA designs feature base and collector transmission lines with low-pass characteristics and a shared dc bias across all gain stages. To avoid I 2 R loss, efficient DAs must supply the collector bias through an off-chip bias tee or choke, whose low-frequency cutoff prevents true dc performance, rather than through the reverse termination. In some applications, such as odd-derivative Gaussian pulse generation and wideband RF communications, it is not necessary to provide amplification down to dc. Thus, the bias voltage levels can be isolated between DA sections. To realize independent biasing of the supply voltages along the DA and eliminate the need for a bulky bias-tee, a band-pass topology is chosen, which introduces dc-blocking capacitors and dc-feed inductors in between transmission line segments as parts of a high-pass T -section filter.
A. Passive Element Design
To achieve Z 0,l = 50 in (1), the transmission line Z 0,T = √ L tl /C tl must be greater than 50 since the device parasitic capacitance lowers the final characteristic impedance. In addition, losses in the transmission line, expressed perlength as α tl in the propagation constant γ tl = α tl + β tl , limit the marginal gain of each additional stage [16] . We seek to minimize the total attenuation factor α tl l seg while maximizing Z 0 to allow for the largest parasitic capacitance loading, and thus, gain per stage. Fig. 7 shows the BEOL stackup for this process. Since the dielectric stack height is sufficiently large, a microstrip line is used as the transmission line element to ease access to the device. Optimizing the shunt capacitance loading budget with respect to line resistance results in a 2 μm-wide line on M9 layer, with Z 0,T of 78.6 and less than 0.7 dB/mm loss up to 110 GHz. Keeping Z 0,l = 50 and setting our target bandwidth f c,l = 110 GHz, the total series inductance and shunt capacitance per stage are 145 pH and 58 fF, respectively. For comparison with Fig. 3 , this results in a transmission line θ of π/2 at 55 GHz.
The band-pass DA also includes a high-pass constant-k section to decouple the dc level of each stage. Fig. 8 shows the embedded high-pass T -section within the standard low-pass filter. For shunt inductance L hp and series capacitors 2C hp , the characteristic impedance and low-frequency cutoff are
Matching Z 0,h = Z 0,l = 50 and choosing f c,h = 8 GHz to cover X-band frequencies, (15) and (16) give L hp = 500 pH and C hp = 200 fF.
As the additional components of the high-pass T -section contain parasitic elements, the effect on DA performance should be considered. The shunt inductors contribute not only inductance, but also shunt capacitance and conductance due to winding and substrate leakage. Shunt capacitance sets a self-resonant frequency for the inductor in many applications, but in the artificial transmission line of a DA, it can be included as part of the low-pass T -section. Thus, whereas the parasitic capacitance degrades the high-frequency performance in a purely high-pass DA [17] , it is absorbed here with C par from the transistor. While this means the capacitance will not modify the response of the high-pass constant-k sections, it does lead to a reduction in the allowable device size, which may result in a reduction in either f T or P 1dB and P sat , depending on how the device biasing is optimized. As such, it is imperative to make the inductor footprint as small as possible to occupy less of the shunt capacitance budget. Generally, reducing the line width and turn diameter to decrease the capacitance to ground results in higher series resistive losses through the inductor. However, due to the inductor's already high-impedance nature at RF frequencies, this parasitic resistance is not critical in comparison to capacitive effects. Assuming a fixed resistance R S over frequency, and an inductor component quality factor greater than 10 in the band of interest,
From (17), it is clear that with increasing frequency, the impact of the inductor losses becomes minimal, justifying the use of a low-Q, compact, high turn count, square spiral. Electromagnetic simulations [18] of the inductor show the conductance is less than 2 mS above 20 GHz, significantly less than that of the transistor, indicating negligible impact on DA performance. Fitting the simulated behavior to the lumpedelement model shown in Fig. 8 , it is found that R ser dominates the resistive performance, and thus, the capacitive elements are absorbed into the constant-k section with minimal effect. The spiral inductor achieves peak Q of 16 after absorption of 10 fF total shunt capacitance into the artificial transmission line (Fig. 9) .
In a similar manner as the shunt inductor, the shunt capacitance to ground for the series blocking capacitors of the highpass T -section can be included in the low-pass constant-k line. High density metal-insulator-metal (MIM) capacitors in the SiGe technology used exhibit shunt parasitic capacitance of only 2.5 fF, and hence do not significantly impact the device size. The series inductance of the capacitor can be absorbed into the inductance of the low-pass constant-k section, and the resistive losses are directly included in the series loss of the artificial transmission lines. Care must be taken in design of the connections to the series capacitors to minimize this resistance. Capacitors from adjacent high-pass sections can be combined to reduce area and loss. 
B. Active Element Design
In the 90 nm SiGe BiCMOS process, HBTs have simulated peak transit frequency f t upwards of 300 GHz [19] , [20] . However, base resistance in bipolar devices creates shunt losses in the synthesized input line, degrading the gainbandwidth (GBW) product of the DA. Furthermore, as detailed in [10] , the input conductance and capacitance of a commonemitter amplifier increase with frequency, incurring extra loss and impedance mismatch. To counteract these effects, resistive emitter degeneration flattens the input characteristic for ultrawideband operation. For base-emitter capacitance C be , base resistance r b , transconductance g m , and degeneration resistance R E , the capacitance and conductance looking into the base are given by [10] :
where C = C be /(1 + g m R E ). As shown in Fig. 10 , increasing emitter resistance R E has diminishing returns on the decrease of input capacitance -this in turn reduces the transmission line loading sensitivity to process variation. On the other hand, while larger R E linearly reduces the effective G m of the transistor, the smaller input capacitance and conductance allows for the use of a larger device and lowers the shunt loss in the T -section. With these considerations, a 20 resistor is chosen to maintain high overall gain for the target operation bandwidth. The bandwidth gained comes at the expense of a slight degradation in efficiency, as there is power lost across R E . As shown in Fig. 11 , we trade only 1% in PAE to improve the bandwidth by more than 80 GHz. The final DA gain stage employs an HBT cascode to mitigate the Miller effect, increase the input and output impedances of the stage, and improve the isolation between base and collector lines. High-performance HBTs in this process achieve maximum f t at 2 mA/μm current density. We bias the common-emitter transistor at 1.8 mA/μm to avoid the f t -rolloff associated with Kirk effect at high current swing. Because of the difference in capacitance seen at the base and collector, emitter lengths of 12 μm and 6 μm are chosen for the common-base and common-emitter devices, respectively, to satisfy Z 0,l = 50 for both input and output lines. Additionally, a degeneration capacitor of 40 fF is included in parallel with R E to introduce a high-frequency zero at ω z = 1/(R E C E ) for gain peaking near the low-pass cutoff. To ensure good decoupling of the dc bias network, the base of the cascode device is biased at 2.6 V through a combination of MOS and MIM RC low-pass filters. The MIM capacitor is placed as closely to the device as possible to limit the parasitic inductance of the bias path, preventing high-frequency instability. Simulated optimal PAE versus number of independent scaled supplies for an 8-stage DA. Though an ideal DA exhibits unbounded GBW product as more stages are added, attenuation of the signals along the base and collector transmission lines limits the achievable gain in practicality. Fig. 12 shows the simulated gain versus number of stages. An 8-stage DA is found to offer a good GBW-to-chip area ratio. A sweep of the number and values of collector dc voltages in Fig. 13 then reveals the optimal supply-scaling for peak PAE. Since the improvement in efficiency between four and eight independent supplies is small, we opt to tie the supplies of every two stages together as a tradeoff between efficiency and chip complexity. This results in a simulated 20% improvement in PAE over a uniform DA with negligible impact on gain. Though the peak CE for an ideal supply-scaled DA with four independent supplies is 40% from (14) , we are only able to scale from 2.7 to 4.0 V, giving a peak CE of 29.6%-transistor knee voltage and emitter resistance and transmission line losses further reduce the efficiency. Fig. 14 shows the simulated collector voltage magnitude for each successive gain stage. Compared to the voltage distribution found for an ideal low-pass DA in Fig. 3 , the band-pass DA sees a shift in the zero electrical length frequency due to the high-pass T -section. Additionally, the collector voltages exhibit non-idealities as the DA approaches 
V. MEASUREMENT RESULTS
The schematic of the fabricated 8-stage supply-scaled DA is shown in Fig. 15 , and a chip microphotograph is shown in Fig. 16 . The amplifier occupies an area of 2.65 mm × 0.57 mm, including pads. Measurement of the DA is performed with on-wafer probing, and no de-embedding of pad parasitics is done. Forward and reverse terminations are provided on-chip, and the high-pass T -section filters allow for supplying of dc biases without the need for bias-tees. Four supply voltages of 2.7, 3.2, 3.6, and 4.0 V draw 21.6 mA of nominal bias current each, resulting in total dc power consumption of 297 mW for small input signal. Fig. 17 shows the measured and simulated S-parameters and stability factor μ of the DA. The amplifier achieves a peak smallsignal gain of 12.0 dB with a 3 dB pass-band bandwidth from 14-105 GHz (91 GHz), corresponding to a GBW product of 362 GHz. Measured gain is 2.5 dB lower than simulated, which is consistent with slow HBT process corners on this fabrication run. Additionally, the S-parameters show a pronounced ripple in gain and impedance match around 30 GHz. This degradation is mainly caused by imperfect modeling of the dc supply distribution network, whose extra inductance becomes manifest in the high-pass T -section filters near the low-frequency cutoff. Except for this ripple, the input and output return losses (S11 and S22) are better than 9 dB from 10-90 GHz. The reverse isolation is greater than 24 dB, and the amplifier is unconditionally stable across the entire measured frequency range.
Large-signal measurements are performed across the entire operating frequency band. Measured and simulated power, gain, and efficiency at the midband frequency of 50 GHz are shown in Fig. 18 . The DA has a 1 dB gain-compressed output power P −1dB of 14.9 dBm and saturated output power P sat of 17.0 dBm. Peak CE and PAE are 15.1% and 12.6%, respectively. The output power characteristics match simulation results reasonably well, though maximum PAE suffers by 1.2% due to the lower gain. Fig. 19 shows measured and simulated P sat , P −1dB , and peak CE and PAE at various frequencies across the bandwidth. The 3 dB power bandwidth is greater than 70 GHz (15-87 GHz), and PAE is better than 8.5% up to 80 GHz, except for the aforementioned gain degradation at 30 GHz. As a control experiment, the DA was also measured with all supply voltages set equal to the final stage value of 4 V (Fig. 20) . The uniform DA exhibited 17% reduced CE and PAE and consumed the same current compared to supply-scaled operation, while only a 0.2 dB increase in gain and saturated output power on average was observed.
Supply-scaling comes at the disadvantage of requiring multiple supplies. Nonethelesss, given approximately 95% efficiency of modern switched dc-dc converters, the supply-scaled DA efficiency would be reduced by 5.3% to a PAE of 12.0% due to converter inefficiency. Even accounting for the converter, the supply-scaled DA maintains an overall 11.1% efficiency advantage over the uniform DA PAE of 10.8%. In addition, the introduction of switched-mode power converters could suggest the possibility to explore dynamic supply modulation for envelope-tracking techniques under back-off conditions. This verifies the supply-scaling theory and design as an effective method of efficiency enhancement without DA performance degradation. Table I summarizes the performance of similar published wideband DAs. The supply-scaled DA achieves the largest reported GBW product of any single-stage silicon-based distributed power amplifier (> 10 dBm) to the author's knowledge. Among amplifiers shown in silicon, this work also has the largest single-ended output power, with comparable 3 dB power bandwidth. In particular, the supply-scaled DA exhibits nearly 3 dB greater output power and 3% higher efficiency above 70 GHz than the stage-scaled SiGe BiCMOS DA in [10] . While this work implemented a single-ended amplifier as opposed to [10] , supply-scaling could also be applied to a differential DA to double the output power. A differential design might also be utilized to simplify the biasing circuits. However, differential DAs in the millimeter-wave bands require an output balun with a bandwidth that matches the amplifier bandwidth, and this poses a significant challenge. Compared to silicon-based W-band tuned PAs in Table II, the DA achieves much greater bandwidth while maintaining similar peak power, efficiency, and gain.
VI. CONCLUSION
This paper has introduced supply-scaling as a technique to achieve wideband power efficiency enhancement in DAs, and its advantages over impedance tapering techniques are discussed from the point of view of interstage load modulation. Design methodology of a band-pass DA to enable the technique is presented, with focus on the effects of high-pass constant-k filter element parasitics. To verify the theory of efficiency enhancement, an 8-stage supply-scaled DA is demonstrated in a 90 nm SiGe BiCMOS process with bandwidth greater than 90 GHz. Peak saturated output power of 17 dBm is measured with a relatively high PAE of 12.6% using four supply voltages from 2.7 V to 4.0 V. Compared to previously reported mm-wave silicon-based DAs, the presented amplifier demonstrates superior power and efficiency performance above 70 GHz.
