Abstract-Silicon technology has progressed over the last several years from a digitally oriented technology to one well suited for microwave and RF applications at a high level of integration. Technology scaling, both at the transistor and back-end metallization level, has driven this progress. CMOS technology is ideally suited for low-noise amplification and receiver applications, but the fundamental breakdown voltage is lower than that of equivalent Si/SiGe HBTs. High-quality passive devices are equally important, and improvements in metallization technology are resulting in higher quality inductors. This paper summarizes the silicon technology issues associated with RF "system-on-a-chip" applications. 
the wireless handset to be shrunk to that of a typical consumer item, within reach of half of the population on the planet. This paper will outline the key developments and trends in silicon semiconductor technology applied to RF/mixed-signal "system-on-a-chip" implementations of these wireless communications devices. The communications medium that a mobile wireless transceiver typically finds itself in is often referred to as "hostile," since the path-or channel-from the transmitter to the receiver is subject to time-varying obstructions and multipath fading as well as Doppler effects. This is in contrast to "point-to-point" communications links-either wireless, fiber-optic, or free-space optical-where the channel is essentially nontime-varying or "stationary." This hostile channel affects both the design of the transmitter and receiver in profound ways, placing extreme performance constraints on the technology required for their implementation.
One way to view these constraints is through the "football field" metaphor [1] . As an example, the popular European GSM system [2] has a minimum received signal sensitivity level (the smallest level of the desired received signal) at the antenna of 102 dBm (10 W) , but the largest (undesired) interferer also received by the antenna has a level of 0 dBm (10 W) . If the desired signal power is normalized to the size of the head of a pin-approximately one mm in diameter-then the largest interferer is roughly the size of two (U.S.) football fields, 100 m by 100 m. Following this metaphor further, receiving and demodulating a GSM signal is analogous to finding the head of a pin in a football field! In addition, this task has to be done in less than 100 ms, which is typically the time it takes for the cellular handset to receive a call. Viewed through this lens, the modern cellular handset is truly a technological marvel.
The architecture of a general handheld wireless transceiver (transmitter and receiver) is shown in Fig. 1 . In the receiver design, the radio signal is sent from the receiving antenna to a low-noise amplifier (LNA), whose purpose is to boost the signal level without reducing the signal-to-noise ratio (SNR) significantly. The signal level at the antenna can range between 1 V rms to nearly 100 mV rms-over a 100-dB variation! At the low end of the signal range, the LNA performance is fundamentally limited by thermodynamic and electron transport issues, while at the high end of the signal range, the challenge is to minimize the effects of nonlinearities on receiver 0018 -9383/03$17.00 © 2003 IEEE Fig. 1 . Architecture of a typical wireless handset transceiver. The receiver employs a direct (single-step) downconversion approach, while the transmitter employs a "two-step" heterodyne technique.
performance. These diverse requirements are often referred to as the "LNA Bottleneck" [3] . As a result, the high-frequency LNA must exhibit excellent performance over both small-signal and large-signal conditions. Modern CMOS FETs and SiGe HBTs are particularly well suited to this application in the 1-5-GHz range, because of their outstanding and low gate/base resistance.
In addition, the LNA is typically "on" all the time-listening for transmitted signals of interest-so it is constantly draining power, and therefore it must dissipate as little dc power as possible. The combination of extremely high-performance and lowpower requirements result in the LNA being one of the most significant power drains in the system.
Following the LNA, the signal is typically passed through a mixer, which essentially multiplies the input signal by a local oscillator signal of constant frequency, producing an output signal whose frequency is the difference between the two inputs-the so-called intermediate frequency (IF)-and whose amplitude is proportional to the original input signal. In the case of Fig. 1 , the receiver architecture employs the direct-conversion (or homodyne) approach, and the IF is at dc. Preceding the mixer, an analog filter eliminates the response to an undesired input signal that would "jam" the receiver and compress its gain. This filter is typically implemented with a physically large off-chip surface acoustic wave (SAW) device. In addition to their excessive size, these filters have extremely unforgiving sensitivities to variations in source impedance and ground loops, to name a few.
Substantial progress has been made recently in the area of direct-conversion approaches for wireless receivers, which are well suited to monolithic integration. The advantage of this particular architecture, compared to the more traditional two-step heterodyne approach, is that it is uniquely well suited to monolithic integration, due to its low-frequency filtering, lack of serious image responses, and its intrinsically simple architecture [4] . The output signals from the and paths are converted into digital values with an A/D converter and then sent to the digital baseband section for synchronization and data recovery.
On the transmitter side of Fig. 1 , the goal is to modulate the and data from the digital baseband section onto the high-frequency carrier with near perfect fidelity and vary the output power transmitted over nearly 80 dB of dynamic range; a high output power is transmitted when the handset is far from the base station, and low power when the handset is close to the base station. This is known as the "power-control loop" in a CDMA handset and is necessary in order to solve the "near-far" problem in multi-user CDMA systems [5] . The maximum output power level at the antenna is approximately 1 W at a frequency of roughly 2 GHz.
In this case, the filtered digital data is modulated onto an IF carrier (typically several hundred megahertz), partial gain control is applied at that frequency, and then a second stage of upconversion is applied to the signal to transfer it to the final frequency, where the remaining gain control occurs. This "two-step" approach allows for the gain control to be split over several different frequencies, improving isolation between the bands. In this case, the key requirements are the maintenance of the fidelity, or linearity, of the signal at the final output stage and dc efficiency to maximize battery life.
The reference frequencies required for upconversion and downconversion of the transmitted and received signals are generated by a frequency synthesizer, which uses a precise reference (usually produced by crystal oscillator) to synthesize the necessary local oscillator frequencies. In this case, the phase noise of the synthesized signal must be as low as possible to accurately modulate and demodulate the signal. Furthermore, the synthesizer itself is a complex RF/analog/digital circuit, which generates copious amounts of digital switching noise and harmonics. Historically, the synthesizer circuit was contained on a separate integrated circuit but, with system-on-chip (SOC) implementations, this noise must be isolated from the sensitive receiver circuits despite the fact that they share a common substrate and package environment. This presents a fundamental challenge to the integration level of these complex circuits.
The digital portion of the communication system performs the key functions of modulation and demodulation (the so-called "modem"), carrier recovery, timing recovery, symbol recovery, equalization, channel coding, power detection, and calibration, among others [5] . Separate digital controllers also perform media access control (MAC) functions as well as a variety of other control functions. The eventual goal is to include all these digital functions on the same integrated circuit substrate as the RF and analog circuits in order to realize a true "single-chip" communications system implementation. This goal has been realized on a number of ultralow-cost digital systems-such as those conforming to the well-known Bluetooth standard [6] -but remains elusive for higher performance systems involving cellular telephones or wireless LANs.
The range of wireless communication systems employing RF techniques that are amenable to SOC implementation has grown dramatically and continues to expand with new standards and applications being developed all the time. Consumer demand for the applications available using untethered wireless devices continues to grow, and Table I summarizes some of the data rates and bandwidths being developed on a worldwide basis. The key to the widespread adoption of these systems is developing low-cost highly integrated implementations of the key radio and digital functions.
The goal of this paper is to highlight the key technological tradeoffs in silicon integrated circuit technologies for these RF SOC applications. As a result, it is meant to be a "survey" of existing results in a variety of disciplines, which, when presented together, present a more complete picture of the technological challenges that face the RF-SOC community. Based on the improvements in device performance achieved in recent years, it is clear that, CMOS or BiCMOS technology, where high-quality active and passive devices are integrated on a common substrate along with a high level of digital integration, is the preferred medium for RF-SOC implementation.
The paper begins in Section II at the level of substrate improvements and interdevice isolation developments that are required for RF-SOC applications. Section III discusses transistor level performance and explores how transistor scaling enhances the performance of key building blocks for wireless communications systems. This is related to the and of the transistor, as well as the achievable breakdown voltage. The characteristics of other important factors, such as passive device performance are also explored. Next, the performance of several key RF SOC circuits is analyzed in terms of the active and passive device performance at the lower level. LNA performance is discussed in Section V, and voltage-controlled oscillator (VCO) performance is discussed in Section VI. Finally, conclusions are presented concerning the challenges to RF/SOC implementation as silicon technology is scaled in the future.
II. SUBSTRATE AND ISOLATION TECHNOLOGIES FOR RF-SOC APPLICATIONS
The substrate plays an intimate role in determining the performance of an RF-SOC. This is because the desired signal levels are so small, and the frequencies are so high, that undesired spurious signals can leak into the sensitive receiver portions through almost any path, particularly capacitive coupling through the conductive substrate. Returning to the example of the CDMA transceiver of Fig. 1 , the received signal strength can be a as low as 104 dBm, but the transmitted signal strength can be as high as 23 dBm-a nearly 130-dB difference! Clearly, the isolation between the receiver and transmitter on an SOC is a significant challenge. Similar isolation considerations apply for the required isolation between the frequency synthesizer and the receiver, where digital switching noise can couple into the receiver through the substrate.
In addition, the conductive silicon substrate increases the eddy losses in monolithic inductors and increases the losses associated with high-frequency transmission-line structures. Fortunately, these problems are well known, and many enhancements have been suggested to improve substrate loss and interdevice isolation for RF-SOC applications. These improvements can be grouped into the categories of substrate resistivity enhancements, implant blocking layers, and layout-dependent improvements.
Digitally oriented bulk CMOS processes typically rely on a low-resistivity substrate in order to minimize latch-up considerations. The resulting conductive losses in monolithic inductors and other high-frequency circuits are usually considered to be excessive for RF applications, and so lightly-doped p-type substrates are more typically used in both CMOS and BiCMOS approaches for RF-SOC applications. In this case, the typical bulk resistivity is roughly 10 cm, corresponding to a doping density of approximately 5 10 cm .
Interdevice isolation can be improved through a variety of approaches at the substrate level. Simply increasing the resistivity of the silicon substrate is the most conservative approach, and can provide for a significant improvement in isolation. The resistivity of production Czochralski (CZ) wafers is currently limited to a maximum of roughly 10-20 cm, although a new technique known as Magnetic Czochralski (MCZ) has demonstrated resistivities up to 1 k cm [7] . In addition, Float-Zone (FZ) silicon wafers can achieve resistivities of up to 10 k cm, although the cost of FZ material is currently substantially higher than that of CZ wafers [8] . These highly resistive substrates have historically exhibited manufacturing problems under the high-temperature stress of subsequent wafer processing, but great progress has been made recently in this area [9] .
Other, more exotic bulk approaches to improving substrate resistivity or isolation have also been proposed for RF-SOC applications. These include the use of silicon-on-insulator (SOI) [10] , silicon-on-sapphire (SOS) [11] , silicon-on-anything (SOA) [12] , porous silicon [13] , through substrate vias [14] , and bulk micromachining [15] . None of these techniques have found their way into widespread use, although there have been some notable successes in niche applications. Clearly, the manufacturability and cost-effectiveness of these more exotic technologies in a high-volume consumer-oriented marketplace must be carefully considered.
A more traditional approach to the problem of improving device isolation is the use of grounded "guard rings" that surround the sensitive active devices. This approach is shown in Fig. 2(a) . The effectiveness of this technique depends on the width of the guard ring, substrate resistivity, and the inductance between the guard ring and ground. In general, the isolation improves with increasing spacing, guard ring width, and substrate resistivity. An extreme example of the use of guard rings to improve isolation is the Bluetooth chip presented by ISSCC2002 [16] . In this case, a 300-m guard ring completely surrounded the sensitive RF portions of the circuit, separating it from the digital portions. This enabled the chip to exceed the Bluetooth 2.4-GHz receive sensitivity level of 70 dBm; in fact the designed achieved a sensitivity of 82 dBm on a single die containing all of the RF, analog, and digital functions.
With a fixed substrate resistivity, a further improvement in isolation can be achieved through the use of a deep low-resistivity n-well placed underneath the active device; when biased to a low-impedance and low-noise potential, it acts as an effective shield to signals injected from nearby sources. This approach is shown in the cross section in Fig. 2(b) . The addition of a deep well can improve isolation between adjacent devices by roughly 20 dB (from 40 to 60 dB) at 2 GHz [17] . The effectiveness of this technique at high frequencies depends on the common inductance of the signal line and the grounding structure of the n-well, and an inductance of as little as 0.5 nH can significantly degrade the improvement at frequencies above 1 GHz [18] . This "triple-well" technology is now a standard option of many sub-0.18-m CMOS processes, using both lowand high-resistivity substrates.
Deep trench isolation techniques are a standard feature of many advanced BiCMOS technologies, and the use of deep trenches has proved effective in improving inductor quality factor by reducing eddy losses. It can also be employed to improve isolation between devices, as shown in Fig. 2(c) , although the improvement in device isolation is modest compared to the other two approaches.
Clearly, a combination of lightly doped substrates, deep n-wells, and generous guard rings can provide for improved isolation in an RF-SOC environment. The challenge then becomes integrating these features into a design environment that can predict the isolation prior to fabrication. This is highly challenging, as it requires integrating an accurate physically based equivalent-circuit model of the substrate and the isolation structures with the simulation of the rest of the circuit. Several tools have recently been introduced to accomplish this, although more work remains to be done in this area [19] .
III. TRANSISTOR SCALING FOR RF-SOC APPLICATIONS
The key active device parameters for enhanced circuit performance of noise and linearity for most RF-SOC applications are the short-circuit unity current gain frequency ( ) and the 
maximum unity power gain frequency (
). These two parameters have made astonishing progress in recent years in both HBTs and MOSFETs, with recently reported values for both devices in excess of 200 GHz [20] , [21] . The next most important issue is breakdown voltage, which together with noise considerations sets the dynamic range limitation of most circuits.
If we examine the Si/SiGe HBT first, using the physical cross section and equivalent circuit model of the device shown in Fig. 3 , the is given by (1) where and are the parasitic emitter and collector resistances, is the collector-base junction capacitance, is the emitter-base junction capacitance, is the base transit time, and is the collector transit time. In most high-frequency applications, the base and collector transit times dominate the , and the other parasitic-related terms have a secondary effect. For this same physical structure and equivalent circuit model, the of the transistor is given by [22] ( 2) where is approximately , but can be more accurately described as a weighted average of the distributed base resistance and base-collector capacitance [23] . Equation (2) is slightly pessimistic in cases of large collector junction width, but represents a good starting point for discussions of device scaling issues.
The base transit time is given by [24] ( 3) where is the base thickness, is the base exit velocity (roughly , where is the electron effective mass), is the base minority carrier diffusivity, and is the grading in the base bandgap energy.
The collector transit time is the average delay of the electron transit through the collector depletion region and is given by [25] ( 4) where is the collector thickness, and is the effective saturated electron velocity ( cm/s). These expressions highlight the critical role of vertical scaling to improve the and for bipolar device performance. At the same time, lateral scaling of the devices is equally critical, to further reduce extrinsic base resistance and collector-base capacitance.
The base resistance , which has a large impact on is a result of the sum of several components, including the spreading resistance underneath the emitter , the base-emitter gap resistance and the contact resistance . Given a contact resistivity and a base sheet resistance , the resulting base resistance is [23] (5a)
where is the emitter width, is the emitter length, and is the gap width between the emitter and base. It is clear that increasing the of the device through reducing the base thickness will improve the , but the base resistance can rise from the increase in , minimizing the overall improvement. Most scaling efforts with HBT structures aim to keep the equal to or slightly larger than the . The dependence of transistor and on base width can be seen clearly from the plots of measured devices in Fig. 4 , where the clear dependence of transit time on base width has a significant effect on [26] . The effect of base width on is less pronounced, due to the additional necessity to keep base resistance equally low.
Although Si/SiGe HBTs have historically been leading MOS devices in terms of peak reported , super-scaled MOS devices 
where is the transistor transconductance, is the gate-source capacitance, is the gate-drain capacitance, is the parasitic drain resistance, is the parasitic source resistance, and is the drain-source conductance. For this physical structure and equivalent circuit model, the is given by (7) where is approximately but is more accurately described as a weighted average of the distributed gate resistance and gate-drain capacitance, in a manner analogous to that of the HBT. From a physical perspective, modern MOSFETs operate in a heavily velocity saturated regime, where transit time effects dominate the characteristics. As MOSFETs scale to smaller and smaller dimensions, the gate resistance effect on and noise can become increasingly problematic. The dc gate resistance (per finger) is given by (8) where is the gate resistivity, is the gate thickness, is the gate finger width, and is the gate length. Due to capacitive shunting effects, the per-finger effective series resistance at high frequencies is given by [27] (9)
Since scales along with as the devices are reduced to smaller and smaller dimensions (in order to maintain a roughly constant aspect ratio), the dc gate resistance can rise nearly as fast as the square of the scaling factor [28] . This effect has been addressed through a variety of proposed improvements, including the use of "T-gate" structures [29] and parallel gate strapping approaches [30] .
As Fig. 6 demonstrates, the of modern scaled MOS devices approaches a value of [31] (10) Fig. 6 also demonstrates that the ratio of for the MOSFET has been falling as the speed of the devices rises, in response to the increased gate resistance of the ultrashort gate length in the sub-0.25-m region. In the case of a half-micrometer 20-GHz device, the ratio is nearly two, but it drops to less than one for the 0.1-m design.
The other absolutely key issue for RF applications of scaled transistors is the breakdown voltage of the device, which influences the dynamic range of operation. The breakdown voltage of a transistor is mostly an issue for the implementation of power amplifiers in the transmitter section, although other circuit areas can benefit from a high breakdown voltage as well. The breakdown voltage issue is complicated by the physics of the device at high electric fields, the varied physical mechanisms that lead to device failure, and the interaction of the breakdown mechanisms with the external circuit.
The bipolar device is fundamentally limited by avalanche multiplication in the collector-base region [32] . This breakdown effect is traded off against the increasing of the transistor, and the product is the key consideration for most high-frequency applications and is a material-related constant known as the Johnson limit [33] . In the bipolar device, the collector-base junction typically experiences avalanche breakdown first, and the device can be characterized by the collector-emitter breakdown voltage when the base is shorted to the emitter (BVCBO) or when the base is open-circuited (BVCEO). The former is usually larger than the latter, due to current gain in the emitter-base region, and can be approximated by [34] BVCEO BVCBO (11) where is the dc current gain of the transistor and is a constant that varies from 2 to 5, depending on a variety of physical factors. When the devices have very shallow doping (as in the high case), the transistors exhibit nonlocal avalanche, and the of the device can exceed its value seen for lower frequency devices [35] . Fig. 7 plots the BVCEO and BVCBO for modern bipolar devices, and the effects of nonlocal avalanching on breakdown voltage can clearly be seen at the higher values, where the breakdown voltage does not change significantly as the increases. In the operation of a power amplifier circuit, the device can typically operate at peak voltages in excess of BVCEO, but less than BVCBO, due to the time-dependent nature of the carrier multiplication process [36] and the impedances presented at each terminal.
This last issue of terminal impedances is crucial in the operation bipolar devices for power amplifiers, since the current gain at the emitter-base junction influences the breakdown characteristics. A simplified view of typical power amplifier operation is shown in Fig. 8 , and the collector-base avalanche current can be modeled by (12) where is a technology-dependent avalanche breakdown constant.
The transistor exhibits breakdown when and therefore (13) where is the effective transconductance of the device, including the feedback effects of any extrinsic emitter impedance, and is the input impedance consisting of the parallel combination of the extrinsic source impedance (including the base resistance ) and the input impedance due to the finite .
Breakdown occurs when (14) In the limiting case of a low source impedance, is simply the transistor base resistance . Then (14) reduces to which illustrates the dependence of breakdown voltage on base resistance; as the base resistance increases, the internal feedback shunts more and more of the avalanche current to the emitter, increasing the positive feedback that leads to breakdown.
In the limit of a high source impedance (BV ), increases to approximately and BV BV (17) which illustrates the well-known relationship between BVCBO and BVCEO in the bipolar transistor. The dependence of bipolar breakdown voltage on source impedance can be exploited in power amplifier design to significantly increase the safe operating voltage range. The breakdown voltage mechanisms limiting MOSFET performance are complicated by the diverse breakdown mechanisms, primarily time-dependent dielectric breakdown (TDDB) due to impact ionization in the drain region, gate-oxide rupture, drain avalanche breakdown, parasitic bipolar transistor operation, and punchthrough [37] .
From a reliability perspective, TDDB presents the most significant limitation on dynamic range in scaled MOSFETs. This effect is a result of damage to the silicon-oxide interface due to injection of hot electrons at the drain. This shifts the threshold voltage of the device over an extended period of time [38] . The recommended voltage limitations are typically based on dc or transient reliability tests, but in many RF applications the instantaneous dc voltage can significantly exceed the dc voltage, with potentially deleterious consequences. This phenomena has recently been observed to degrade the output power of a 0.18-m CMOS power amplifier over a matter of days of operation [39] . A comparison of the HBT BVCEO and BVCBO and the recommended operating voltage for a MOSFET as a function of is shown in Fig. 7 . There seems to be a small but significant advantage for the bipolar device in this high-voltage regime, which is attributed to the fact that there is a cumulative degradation mechanism when the MOSFET is operated in the weak avalanche range of operation (due to the long-term shift in the threshold voltage). By comparison, bipolar devices appear to recover without any degradation in performance from weak avalanche breakdown in the collector-base junction. This will have a significant impact on the design of power amplifiers in these technologies, although it should be noted that LDMOS devices exhibit excellent performance in high-power base station amplifier applications [40] . In this case, the device is engineered to exhibit a very high breakdown voltage as well as acceptable gain at microwave frequencies, which is very different from design considerations that go into typical digital CMOS device scaling.
IV. PASSIVE DEVICE PERFORMANCE TRADEOFFS FOR RF SOCS
The required circuits for the implementation of these "systems-on-a-chip" require a wide variety of elements, over and above the n-channel and p-channel MOSFET and NPN of a typical BiCMOS digital ASIC. This includes the need to include high-quality inductors, capacitors, varactor diodes, transmission lines, and resistors. This section will discuss the tradeoffs and limitations of inductors and capacitors, which are two of the most challenging components to implement in monolithic form.
The implementation of high-quality monolithic inductors on silicon was considered to be an intractable problem until recently. The fundamental problem for integrated inductors is that they need to store much more energy than they dissipate per cycle, and it is very difficult to store a large amount of energy in the small volume of an integrated circuit die. The maximum stored energy per cycle is ( ) and the average dissipated energy per cycle is ( ), where is the inductance, the series resistance, is the maximum current flowing through the inductor, and is the radian frequency.
This limitation on the energy can be seen by examining the fundamental scaling properties of the classical toroidal inductor shown in Fig. 9 (a) and its equivalent circuit of Fig. 9(b) [41] . This scaling behavior of this structure is closely related to that of the planar inductors on an integrated circuit die.
In this case, the inductance is given by (18) where is the number of turns, is the length, is the inductor diameter, and is the "form factor" (which depends on the ratio of to ).
The finite resistance of the wiring creates an equivalent series resistance of (19) where is the metal resistivity and is the total length of the wire ( ). The resulting quality factor of the inductor (the ratio of the stored to dissipated energy per cycle) is (20) If we now adjust every dimension of the inductor by a factor of , as would be the case for scaling a large inductor down to the size compatible with an integrated circuit, then the inductance becomes (21) and the series resistance becomes (22) and the resulting quality factor of the scaled inductor becomes (23) So, decreasing the physical size of the toroidal inductor by a factor of ten will reduce the resulting quality factor by a factor of one hundred. This argument accounts for the historically low of monolithic inductors compared with their discrete board level counterparts. For example, the quality factor of discrete surface mount RF inductors is at least a factor of ten higher than their integrated circuit counterparts [42] . This scaling argument also applies for inductors fabricated on an integrated circuit die, with some small modifications. For example, on an integrated circuit die, the metal thickness does not scale with the size of the inductor (the metal thickness is determined by the fabrication technology), so the decrease in quality factor due to scaling by an order of magnitude will be closer to a factor of ten than one hundred.
By comparison, the scaling properties of monolithic capacitors are much more advantageous. In this case, as the physical cross section and equivalent circuit of Fig. 9(a) and (b) shows, we have (24) where is the capacitor plate area, is the dielectric constant of the interlayer dielectric, and is the dielectric thickness. The equivalent series resistance of the capacitor, which is dominated by the metal resistance in a monolithic circuit, is (25) where is the metal resistivity, and is a factor that accounts for the contact resistance to the metal. The resulting Al-based and Cu-based metallization. Note that the area increases and the self resonant frequency decreases as the inductance grows [44] .
for the series equivalent circuit of the capacitor-both before and after scaling-is therefore (26) which is independent of scaling, since the resistance rises as the capacitance falls. As a result, the size of a monolithic capacitor can be dramatically reduced without affecting the resulting , and this is borne out in the measured data.
A. Monolithic Inductors
Inductors are a crucial part of any RF-SOC implementation, and they are especially important for high-performance frequency synthesizers and LNAs. The major improvements in inductor performance have occurred through the application of more lightly doped silicon substrates, thicker dielectric layers, thicker metallization, as well as a move to copper metallization [43] . The improvement in inductor as a result of the migration from aluminum-based metallization to copper-based metallization is illustrated in Fig. 10 [44] . In the case of Cu metallization, the ohmic losses due to metal winding resistance are greatly reduced compared to Al-based structures. However, in most cases, the inductor performance then becomes dominated by losses in the silicon substrate. Further improvements in inductor will then require even larger dielectric stacks (to further separate the metallization from the lossy substrate) or some sort of transferred substrate approach to completely separate the inductor from the silicon [45] .
There have been some incremental improvements in the design of monolithic spiral inductors using more exotic techniques as well, including the use of patterned ground shields to reduce eddy current losses [46] "hairpin" designs [47] , micromachining techniques [48] , and "solenoid"-based designs [49] .
B. Monolithic Capacitors
Monolithic capacitors represent a relatively straightforward implementation of modern MOS technology to the problem of a high-performance passive component and do not suffer from the quality factor limitations of monolithic inductor structures. As an example, reported values of 80/f(GHz)/C(pF) for MIM caps with 0.7 fF/ m and 20/f(GHz)/C(pF) for MOS caps with 1.4 fF/ m were reported in 1997 [50] . More recently, MIM capacitance densities of 2.7 fF/ m with 's of 150 were reported using a sputtered plasma-enhanced chemical vapor deposition (PECVD) nitride dielectric [51] . In the case of MIM capacitors, the challenge is to reduce the area of the capacitor, in order to reduce the area of the overall die, through the use of thinner dielectrics and higher dielectric constant materials.
Recently, in an attempt to provide for a high-capacitance-per-unit area in a standard digital CMOS process, several groups have reported "fractal" capacitors, where fringing fields are used to provide the capacitance [52] , [53] . Although the capacitance values per unit area are rather modest compared to the above MIM structures (0.2-0.5 fF/ m ), they provide an alternative approach for the realization of high-quality capacitors without the need for an extra process step, as in more traditional MIM capacitance structures.
V. COMPARATIVE LNA PERFORMANCE OF MOS AND BIPOLAR TRANSISTOR CIRCUITS FOR RF-SOC APPLICATIONS
The front-end LNA of Fig. 1 is one of the key determiners of SOC performance, since the overall SNR of the final received signal is set by the noise performance of this particular amplifier. Typical wireless application frequencies today are in the 1-5-GHz range. Both bipolar and MOS transistors have been utilized recently for front-end applications in wireless systems. Fortunately, the microwave noise performance of both bipolar and MOS transistors has improved dramatically in recent years, thanks to aggressive technology scaling that was largely designed to improve digital circuit performance.
The input referred noise performance of a radio receiver determines the minimum signal level that can be reliably demodulated. As a result, it is a key factor in determining the range and power dissipation of the entire communications system. The noise factor ( )-defined as the degradation of the SNR of an input signal as it passes through the amplifier-is the standard metric for determining the noise performance of an RF receiver and is given by (27) , shown at the bottom of the page, and the noise figure ( ) is defined as in decibels [i.e., ]. An equivalent circuit diagram of modern MOS and bipolar transistors, showing the major contributors to microwave noise performance, is shown in Fig. 11 . In the case of the deep submicrometer MOSFET, the main contributors to the noise factor are the drain current noise and the thermal noise contributed by the extrinsic gate resistance. These two noise contributors are given by [54] ( 28) where is Boltzman's constant, is temperature, is the measurement bandwidth, is a conductance term that is equal to the drain-source conductance at , and is the "channel noise factor" [55] .
The channel noise factor ( ) is a complicated factor of device design and bias conditions and can be approximated by [56] ( 29) where is the saturated electron drift velocity, is the carrier relaxation time, is the effective gate length, and is the ratio between bulk transconductance and gate transconductance (approximately unity). The quantity is approximately 2/3 for long-channel devices, but rises to nearly two for short-channel devices, is a strong function of applied gate-to-source voltage, and is also a weak function of drain-to-source voltage [57] .
Given this noise model, the minimum noise factor (when the device is presented with an optimized source reactance) as a function of the source resistance is [58] (30) and the noise figure is minimized at a source resistance of (31) and the minimum noise factor at that source impedance is approximately (32) Therefore, the noise figure of the MOSFET is primarily determined by the gate resistance and the of the transistor. As technology scales to shorter and shorter gate lengths, the gate resistance may become an increasingly dominant factor, even as the noise figure itself improves. In a practical circuit, the noise figure is limited by technological factors, as well as by the fact that the optimum source impedance is rising as the gate length shrinks, and this impedance may not be achievable using practical circuit components. This simplified model leaves out the induced gate noise due to the drain current, which has been analyzed by several authors but the overall trend in the results remains the same [59] .
Total Noise Power Delivered to the Load Impedance Total Noise Power Delivered to the Load Impedance due only to the Source (27) The equivalent circuit model for a bipolar transistor results in a similar set of design tradeoffs for device design and noise optimization. In this case, there are three dominant broad-band noise sources given by [60] (33a)
where is the base resistance and and are the dc collector and base currents, respectively. The quantities and are normally considered to be uncorrelated, but at high frequencies, the correlation between the two is given by [60] (34) Given this noise model of the bipolar transistor, the transistor will exhibit the following minimum noise factor as a function of source impedance when presented with the optimum source reactance [58] : (35) Note that-in the limit of high -this result is quite similar to the minimum noise factor in the MOS case, with replacing and replacing . So, the keys to lowering the noise figure of the bipolar device are the reduction in and/or an increase in the . The main difference between the MOS and bipolar case is the final term in (35) , which is dependent on the current gain of the device, and results in the well-understood role of base current in limiting the noise performance of bipolar amplifiers. In most cases, this final term is small relative to the first two, especially for modern HBT devices with high dc current gains at high frequencies (larger than the frequency where the current gain begins to decline from ) [61] .
A comparison of the reported noise figure characteristics of SiGe HBT and MOS devices confirms the analysis given above, as shown in Fig. 12 . The key determinant of noise figure performance is the of the device, with MOS devices demonstrating roughly a 0.5-dB improvement for a given compared to an equivalent HBT device. The difference is mostly attributed to the relatively higher base resistance of the HBT compared to the MOSFET. However, as Voinigescu et al. pointed out, this advantage in intrinsic noise performance of the MOSFET is difficult to realize in practice, because the optimum source impedance for the MOS device is much higher than that of the HBT, making the noise figure of a MOSFET LNA very sensitive to source impedance mismatch [62] . One solution to this dilemma is to increase the size of the MOSFET, at the expense of higher power dissipation. In this case, the bipolar LNA would exhibit a slightly lower power dissipation than the MOSFET implementation for a given noise figure.
Direct comparisons between the noise figure performance of MOS and bipolar LNAs are difficult to perform, due to inevitable circuit interaction effects but a recent result [63] where a 0.25-m MOSFET amplifier (whose peak was roughly 50 GHz) was compared with a 50-GHz HBT amplifier showed essentially equivalent noise figures at 2.4 GHz (2.9 dB), with power dissipation roughly 20% higher in the MOS case. Both MOS and bipolar transistors exhibit relatively broad versus current behavior, and so device level power comparisons between the two technologies are complicated by the simultaneous requirement for low noise and high linearity, as the next section will demonstrate.
A. Comparative Linearity Performance of MOS and Bipolar Transistor Circuits for RF-SOC Applications
Circuit linearity affects the performance of both the transmitter and receiver sections of the RF SOC and the requirements of the two sections differ significantly. An RF receiver is typically operated well below its 1-dB compression point, and therefore small-signal linearity is the key performance metric. In the GSM receiver case, the circuit must be able to amplify a signal of roughly 10 W while simultaneously receiving an undesired signal many orders of magnitude larger. The key figures-of-merit (FOMs) here are the input intercept point and cross-modulation sensitivity. Transmitters are typically operated at high levels of output power, and so their large-signal linearity is the key consideration.
From the perspective of receiver design, which encompasses the low-noise amplification stages as well as the downconversion mixer, circuit nonlinearity arises from weak nonlinearities both in the dependent sources (principally the transconductance) and charge storage elements (capacitors) within the transistor; at low frequencies, the former consideration dominates. In the low-frequency case, the small-signal output voltage of a weakly nonlinear circuit can be described by a power series of the form (36) where is the output current, is the input voltage, and are the power-series coefficients of the amplifier response [64] . Intuitively, the linearity of the circuit will be improved if the higher order power series coefficients are reduced compared to . Unlike in the case of low-noise performance, the linearity behavior of scaled bipolar and MOS transistors in the low-frequency regime are very different.
The low-frequency collector current of bipolar devices continues to be determined by the well-known exponential relationship to base-emitter voltage ( ), even as the device is scaled into the regime where exceeds 200 GHz [20] . In this case, with the typical common-emitter amplifier circuit with load impedance , operated with an ideal voltage source input, the power-series coefficients are given by (37) which shows that the relative ratio of the various power series coefficients are independent of the dc operating current. This point will become more significant when we introduce linearity FOMs.
The low-frequency linearity behavior of MOS devices is not as simply described as that of the bipolar transistor and, unlike the bipolar device, exhibits significant changes as a result of technology scaling. A simple expression for MOSFET drain current in strong inversion and saturation, which will help to illustrate this effect, is given by [65] (38) where is the low-field electron mobility, is a dimensionless body-effect parameter that is close to unity, is gate oxide capacitance per unit area, and are the gate width and length, is the threshold voltage, and is a mobility reduction factor due to the normal gate field [66] .
In this case, the power series coefficients for the nonlinear MOSFET amplifier response are given by [67] (39a) (39b) (39c)
In order to put these results into context, we need to explore how these nonlinearity coefficients (both bipolar and MOS) affect the linearity of the receiver circuit. The standard small-signal linearity FOM for a receiver amplifier is the third-order input-referred intercept point (IIP3). This is defined as the input power level of two input signals (at frequencies and ) where the extrapolated undesired third-order output nonlinear response intersects the desired first-order linear response [68] . The third-order responses are particularly insidious in a narrow-band communication system, especially because one of them appears at and another at ; both frequencies are close to the original frequencies and . Although this figure of merit has many limitations in practical situations, its ease of measurement and calculation make it a perennial favorite among microwave engineers.
The second-order input-referred intercept point (IIP2)-the input power level where the extrapolated second-order response intersects the desired first-order response-is also sometimes specified, although it is usually less important than the IIP3. This is due to the fact that the frequency of the second-order distortion product is well away from the desired signal (at or ), whereas the third-order response frequency is nearly the same as that of the two original input tones. The input intercept points can be referred to the output by simply multiplying by the gain of the circuit.
With a power series model of the amplifier, the IIP3 voltage is given by
The IIP3 of the bipolar transistor at low frequencies and without feedback is then simply given by -roughly 75 mV at room temperature.
By contrast, the IIP3 of the MOS device is "theoretically" infinite in the long gate length regime (where and are both relatively large). Even in the short gate length regime the linearity of the MOSFET is excellent, and the low-frequency IIP3 voltage of the MOSFET amplifier reduces to (41) which shows that the intrinsic linearity performance of the short-channel MOSFET exhibits a moderate increase with bias voltage, unlike the bipolar transistor. At relatively large values of and short gate lengths, the IIP3 can be further simplified to (42) Some typical values for these parameters are m /(V s), V , and m/s, and m which yields an IIP3 of approximately 1.5 V with a V. This is substantially higher than a bipolar transistor operated under conditions of equal current, which accounts for the improved low-frequency linearity of MOS devices compared to their bipolar counterparts. This improvement was confirmed by the experimental results in [63] .
This analysis provides a starting point for a comparison of the two device technologies, illustrating the intrinsic differences between the MOSFET and bipolar transistor, but the linearity performance will change at higher frequencies, due to the nonlinear behavior of the stored charge, and circuit impedances will provide feedback that further alters the linearity.
If we examine the case of the higher frequency performance, the situation is complicated by the nonlinear stored charge effects and the impedances at each terminal of the transistor. These nonlinearities introduce a frequency dependence to the nonlinearity, which considerably complicates the analysis. The situation can be simplified if we consider resistive terminations only at each terminal of the transistor. In this case, the work of Vaidyanathan et al. [69] employing a Volterra series analysis clarifies the relationship between the high-frequency linearity of the bipolar transistor and its physical design, particularly the relationship between the high-frequency linearity and the behavior of its "loaded" unity current-gain frequency , where the loaded unity current-gain frequency is defined as the frequency where the current-gain drops to unity with the appropriate terminating impedances.
As an example, at sufficiently high frequencies, and without avalanche breakdown occurring, the OIP2 of a bipolar transistor is given by the relatively simple relationship [69] ( 43) where is the derivative of with respect to collector current (in the case of the bipolar transistor). To minimize the secondorder intermodulation distortion, the transistor should be designed to have as constant an as possible, and the device will have the highest OIP2 near the peak of the versus curve. The important OIP3 behavior is more complicated than in the OIP2 case, but some important generalizations can be derived from the analysis of device operation. At sufficiently high frequencies, and when the device is operated at the peak of the curve, the OIP3 of the bipolar transistor is given by [69] OIP3 (44) where is the second derivative of the with respect to collector current. This result implies that, when the device is operated at the peak of its versus collector current curve, the best distortion performance is obtained when the device has a high and when the curve is as "flat" as possible. Both the OIP2 and OIP3 results cited above demonstrate that the "ideal" bipolar transistor-defined as one with very low junction capacitances and hence nearly constant -will have outstanding high-frequency linearity and that this intrinsic linearity can improve with future device scaling. As the devices scale to higher , avalanche breakdown in the collector region becomes a significant factor and can also have a deleterious effect on linearity, and this has been examined in several recent papers [70] , [71] .
The distortion results presented so far are device-oriented in the sense that they do not include frequency-dependent circuit impedance termination effects. These can improve (or degrade) the performance of the actual amplifier, depending on a variety of factors [72] . Although the circuit interaction effects for linearity of microwave amplifiers are very complicated-far more so than for noise factor determination-there are some general results that can be used [73] . In particular, the IIP3 of a low-noise bipolar amplifier was improved by over 10 dB through the use of optimized termination impedances at the sum and difference frequencies of the two-tone input signals [74] . This nonlinear cancellation effect is especially useful in bipolar amplifiers because, unlike MOSFETs, the and coefficients in (37) have a well defined relationship, and so the cancellation of third-order nonlinearities can be nearly perfect when proper terminations are chosen [74] .
VI. COMPARATIVE PERFORMANCE OF MOS AND BIPOLAR VCOS FOR RF SOC APPLICATIONS
The VCO provides the frequency reference for the upconversion of the transmitted signal or downconversion of the received signal, as shown in Fig. 1 . The VCO frequency is usually not accurate enough by itself to provide the correct downconversion or upconversion frequency and so is usually phase-locked to a more precise reference frequency. The key performance issues with this circuit are phase noise, power dissipation, and frequency tuning range. Unlike many other circuits, the performance of the passive devices can have a significant impact on the performance of this circuit.
The phase noise of the oscillator is the ratio of the power in the desired output (the carrier) to the output power in a 1-Hz bandwidth at a given frequency offset from the carrier, when the amplitude variation on the carrier has been removed through a limiting process. So, the phase noise is expressed in units of dBC/Hz at a specified offset frequency. Ideally, the spectrum of the VCO output is a delta-function in the frequency domain, so the ideal VCO phase noise would be infinite dBc/Hz at all offset frequencies. Phase noise contributes to a variety of deleterious effects in radio systems, including a rise in the receiver noise floor and reciprocal mixing [68] .
A simplified schematic of a bipolar transistor monolithic differential LC-tuned VCO, along with its most significant noise sources is seen in Fig. 13 . The cross-coupled differential transistor pair presents a negative impedance to the resonator, cancelling the resistive losses in the resonator and enabling sustained oscillation. Frequency variation is achieved with a reverse-biased pn-junction diode or accumulation-mode MOS varactor, which changes the resonant frequency of the circuit.
The close-in phase noise behavior at an offset from the carrier frequency in the differential LC-tuned VCO is determined from the well-known Leeson's model to be [75] ( 45) where is Boltzman's constant, is the absolute temperature, is the amplitude of oscillation, is the resonator loaded quality factor, is the corner frequency where the noise is no longer significant, and is the excess noise factor.
Leeson's model shows that phase noise is reduced as the amplitude of oscillation is increased. However, once the amplitude of oscillation drives the transistors in the cross-coupled differential pair into saturation the loaded quality factor of the resonator is lowered and phase noise degrades significantly. It also illustrates the tradeoff between the power dissipation and phase noise, since a large amplitude will lead to both lowered phase noise and higher power dissipation.
A benefit of a bipolar transistor VCO design is that the corner frequency will be very low due to the excellent linearity of transistors and the low level of noise in the devices, rendering the frequency upconversion process from device noise relatively insignificant [76] . The contribution of noise from MOS-based VCOs is expected to be much larger, due to their intrinsically higher level of noise [77] . However, the upconversion of noise in CMOS VCOs can be dramatically reduced through symmetric circuit operation, as illustrated in [78] .
Leeson's Equation clearly shows the importance of maximizing the factor of the resonating circuit, through the techniques described in Section IV. The excess noise factor is determined by the wideband noise from the cross-coupled differential transistor pair and the dc current noise source, taking the nonlinear operation of the oscillator into account. In the case of a bipolar VCO, the excess noise factor is given by [79] ( 46) where is the signal level required to make the cross-coupled differential transistor pair switch completely to one side, is the parallel equivalent impedance of the resonator, is the dc current, and is the mean square current noise power spectral density.
The odd harmonics around of the base resistance thermal noise are modulated into the resonator passband with approximately equal weights. The contribution to the excess noise factor from this effect becomes . This can be estimated by assuming that the wide-band noise spectrum has been sampled with a periodic impulse train at twice the oscillation frequency [80] . This illustrates the importance of minimizing base resistance for low-phase-noise [79] , [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] , Note that there has been a significant improvement in recent years due to improved back-end metallization and circuit design approaches.
operation as well as the slight penalty incurred through the use of a high device. The shot noise contribution of the individual transistors ( and ) takes place in a short time during the zero crossings of the output waveform. The shot noise contributes a factor of to the excess noise factor. Low-frequency noise from the dc current source results in amplitude modulation of the carrier and therefore little phase noise contribution from this source. However, dc current source noise at frequencies near the even harmonics of the oscillator creates both amplitude and phase noise. If the dc current source noise has a spectral density , it contributes to the excess noise factor [80] . The contributions to from this last noise source have been reduced through the use of noise filtering techniques, which essentially reduce the noise transfer function to the output of the second harmonic contributions [81] .
The performance of monolithic VCOs is affected by so many diverse factors that it is difficult to draw meaningful comparisons between various technology and circuit approaches. A VCO FOM can be defined that provides for some insight into this issue [80] , where FOM mW (47) where is the dc power dissipated by the VCO. Fig. 14 is a plot of the measured FOM for a variety of reported monolithic VCOs (in both bipolar and CMOS technology) as a function of frequency. There is no clear trend in the comparative performance of bipolar versus CMOS technologies-their comparative performance is comparable. However, the plot shows that the performance of VCOs in both technologies has improved significantly in recent years-by roughly 8 dB in the last five years. This is attributed to improvements in inductor quality factor, as was discussed in Section IV, as well as to improved circuit design techniques that filter away much of the noise created by the dc biasing circuits [81] .
VII. CONCLUSION
The performance of RF-oriented "systems-on-a-chip" has been historically limited by the performance of the active and passive devices available from a typical CMOS or BiCMOS integrated circuit technology. In recent years, advances in process technology-mostly intended to improve the performance of digital integrated circuits-have improved the performance of these higher frequency RF circuits as well.
The fundamental requirements of these circuits are those of low noise and (simultaneously) high linearity. This paper has outlined the effect that semiconductor scaling will have on these two performance issues in the coming years. The improvement in device speed (through reduction in lithographic dimensions) will continue to enhance RF circuit performance for many years, although limitations of gate and/or base resistance are becoming increasingly dominant in the sub-0.1-m regime. At the same time, the dynamic range of the circuits will become increasingly challenged-more so with MOSFET than with HBT technology-because voltage limits are being reduced along with gate dimensions. HBT's appear to have some advantages in this regard compared to MOSFETs, since they can accommodate weak avalanche effects without long-term degradation.
The performance of other important RF circuits-such as the VCO-is primarily limited by the performance of the passive device technology, particularly the monolithic inductors, as well as improved circuit design techniques.
