Introduction
A decade ago, RF CMOS, even at low gigahertz frequencies, was considered an oxymoron by all but the most ambitious and optimistic. Today, it is a dominating force in most commercial wireless applications (e.g., cellular, WLAN, GPS, BlueTooth, etc.) and has proliferated into areas such as watt level power amplifiers (PA) [1] that have been the undisputed realm of compound semiconductors.
This seemingly ubiquitous embracement of silicon and particularly CMOS is no accident. It stems from the reliable nature of silicon process technologies that make it possible to integrated hundreds of millions of transistors on a single chip without a single device failure, as evident in today's microprocessors. Applied to microwave and millimeter wave applications, silicon opens the door for a plethora of new topologies, architectures, and applications. This rapid adoption of silicon is further facilitated by one's ability to integrate a great deal of in situ digital signal processing and calibration [2] .
Integration of high-frequency phased-array systems in silicon (e.g., CMOS) promises a future of low-cost radar and gigabit-persecond wireless communication networks. In communication applications, phased array provides an improved signal-to-noise ratio via formation of a beam and reduced interference generation for other users. The practically unlimited number of active and passive devices available on a silicon chip and their extremely tight control and excellent repeatability enable new architectures (e.g., [3] ) that are not practical in compound semiconductor module-based approaches.
The feasibility of such approaches can be seen through the discussion of an integrated 24GHz 4-element phased-array transmitter in 0.18µm CMOS [2] , capable of beam forming and rapid beam steering for radar applications. On-chip power amplifiers (PA), with integrated 50Ω output matching, make this a fully-integrated transmitter. This CMOS transmitter and the 8-element phased-array SiGe receiver in [5] , demonstrate the feasibility of 24GHz phased-array systems in silicon-based processes.
In a phased array transmitter, a beam is formed in a desired direction by varying the relative delay in each radiative element to compensate for the difference in the free space propagation time difference for wavefronts generated by different elements. Electronic variation of the delay enables rapid beam-steering with no change in the mechanical configurations of the antennas. Phased array transmitters enable coherent addition of the signals from all elements in the direction of interest. This coherent addition increases the power radiated in the desired direction while incoherent addition of the signal in other directions ensures lower interference power at receivers that are not targeted. In an n-element transmitter, if each element radiates P watts omnidirectionally, the Effective Isotropic Radiated Power (EIRP) in the main beam direction is n 2 P watts. For example, in a 4-element array if each element generates +15dBm, the EIRP in the beam direction is increased by 12dB (20log 10 4) to +27dBm. This increase in signal power at the receiver is particularly useful at high frequencies, where the efficiency of power amplifiers is low, path-loss is high, and the receiver sensitivity is low.
Phase Shifting
Ideally, to achieve broadband phased-array operation, a true-time delay is required in each element. In classical phased array architectures, the time delay is introduced in the RF path in each element. However, implementing a broadband, low-loss, and linear true-time delay element at RF with a broad range of tunable delay would require high power consumption, large die area, and high voltage device technology to name a few challenges. The true-time delay in the RF path can be replaced equivalently by a delay in the IF path and a phase-shift in the LO path/IF path as shown in Figure 2 , or by implementing the delay in the digital domain. An analog delay element at IF faces similar challenges as a delay at RF. On the other hand, practical considerations of A/D speed and DSP performance limit digital array architecture.
If the bandwidth of interest is sufficiently narrow, the time delay (a linear phase-shift in the frequency domain) can be approximated by a constant phase-shift at the center frequency. The phase-shift architecture can be implemented as an approximation of the delay-based architecture in Figure 2b which leads to a phase-shift in the RF signal path or as an approximation of the architecture in Figure 2b which leads to a phase-shift in the LO path (Figure 2c ). 
RF Mixer and Amplifier
In RF path phase shifters (e.g., [6] [7] ), if the loss is not uniform for all phase-shifts, variable gain amplifiers are required in each element to equalize the phase-shifter losses to avoid array pattern degradation. Also a variable gain would be necessary for formation of nulls in the radiation pattern.
Phase-shifters in the LO path circumvent this problem as the circuits in the LO path, such as the VCO and the LO path buffers, operate in saturation by design since the performance of the mixers in the up-conversion path is improved with larger LO voltage swings. Furthermore, with large LO signal swings at the LO ports of the mixers, the mixer gain sensitivity to the LO signal amplitude is low. As a result, with phase-shifters in the LO path, the variation in signal amplitude for different values of phase-shift is minimal. Therefore, LO path phase-shifting architecture has been adopted for the phased-array transmitter, as the most suitable alternative for a fully-integrated system in CMOS.
Transmitter Architecture
The 4-element fully-integrated transmitter includes four on-chip power amplifiers [4] as well as an integrated frequency synthesizer. Due to concerns related to frequency pulling caused by electromagnetic and substrate coupling, direct up-conversion was considered to be unsuitable. A two-step up-conversion architecture was chosen for the transmitter with LO frequencies of 4.8GHz and 19.2GHz. The two LO frequencies are generated by a single synthesizer loop using a divide-by-four. Quadrature upconversion was implemented in both stages. The image attenuation of the first up-conversion step depends upon the matching and quadrature accuracy of the first up-conversion step. The image signal of the second up-conversion step falls at 14.4GHz and is therefore attenuated not only by the quadrature architecture but also by the tuned stages at RF. Figure 1 shows the architecture and floorplan of the 4-element transmitter. In the signal path, the baseband I and Q signals are up-converted to 4.8GHz by a pair of quadrature up-conversion mixers. The 4.8GHz I and Q signals are buffered and provided to the 4.8GHz-to-24GHz up-conversion mixers in each element. The output of the mixers is amplified and differential-to-single-ended conversion is performed using an on-chip passive balun to drive the on-chip single-ended CMOS power.
The four on-chip PAs are matched to 50Ω at the output [4] . The matching networks in both stages of the PA are designed using low-loss shielded-substrate coplanar waveguides with an effective wavelength nearly half that of silicon dioxide, reducing the size of the output matching network hence lowering loss.
In the LO path, the output of the 16-phase 19.2GHz VCO ( Figure  3 ) is provided to the phase-selectors in each element. These phase-selectors select the right phase of the LO in each element for the desired beam direction. The phase-selection circuitry is controlled by shift-registers that can be programmed using a serial digital interface, enabling electronic beam-steering. The VCO is a part of an on-chip frequency synthesizer which generates the 19.2GHz LO signals from a 75MHz reference. A divide-by-four in the synthesizer loop generates the 4.8GHz LO I and Q signals for the first up-conversion step.
The phase selectors in each transmitter path have independent access to all the phases of the VCO. Low skew distribution of LO phases is guaranteed by symmetric floor planning of an H-tree structure. This is essential because any asymmetry in the LO signal increases the power in the transmit side-lobes. The digital frequency calibration data and the beam-steering information are loaded onto the chip through a digital serial interface
Circuit Building Blocks
The Baseband-to-4.8GHz quadrature mixers common to all elements are CMOS current commuting (i.e., Gilbert) multipliers. The output of these buffers is distributed to the quadrature upconversion mixers in each element using a symmetric H-tree structure to ensure good array performance.
The cascade of tuned stages in the RF path in each element exacerbates any off-tuning in the passive loads. To avoid the problem of gain loss due to off-tuning, switchable capacitors, controlled by programmable shift registers, were implemented at the output of some of the high frequency stages (Figure 4 ). In the pre-driver stage, for example, these capacitors allow the center frequency to be tuned from 23.5GHz to 27GHz which is sufficient to account for process variations.
All the circuits up to and including the 24GHz PA driver are differential while the PA was designed to be single-ended. To avoid power and efficiency loss at the output of the PA, an onchip balanced-unbalanced converter (balun) was placed before the PA. This eliminates the need for an off-chip balun or a differential antenna. The balun was realized with a single-turn transformer to minimize substrate loss through capacitive coupling. The transmitter contains 4 on-chip power amplifiers matched to 50Ω, as shown in Figure 4 [4] . While the input of the stand-alone amplifier is matched to 50Ω, the parasitic inductance of the balun is used to tune out the input capacitance in the first stage of the amplifier integrated in the transmitter. The PA has two gain stages with each gain stage consisting of a cascode transistor pair to ensure stability and increase breakdown voltage. The PA is designed to operate in class AB mode. As the minimum-sized 0.18µm NMOS transistor's f max is around 65GHz, the harmonic content at the drain of the transistor for the 24GHz input signal is low. Harmonic-matching based classes such as class E and class F therefore did not increase the efficiency significantly in simulation and were not used. The output and inter-stage matching networks in the PA are realized with the substrate-shielded coplanar waveguide structure shown in Figure 6 to reduce power losses and area. In this structure, the presence of the ground shield beneath the coplanar signal line increases the capacitance per-unit-length, C u . However, as the return current cannot flow through the patterned ground shield, the inductance per unit length, L u , remains the same. The simultaneously high C u and L u result in lower wave velocity, leading to more than factor of two reduction in wavelength at 24GHz when compared to a standard coplanar waveguide structure in silicon dioxide resulting in a lower loss factor. Additionally, as the structure is well-shielded, the isolation between the power amplifier and other circuits in the transmitter is improved. The low loss per unit length (0.9dB/mm), improved isolation, and short wavelengths, make this structure particularly suitable for integrating multiple power amplifiers on the same die.
The amplifier stability is improved by the RC network at the input of each stage, which guarantees low frequency stability. Further details on the design of the power amplifier and the waveguide structure can be found in [4] .
The multiple phase VCO, shown in Figure 3 , is at the heart of the LO phase-shifting architecture adopted in this work. The 19.2GHz CMOS VCO consists of eight differential amplifiers connected together in a ring structure. As the ring is closed by flipping the inputs of the last amplifier, the VCO is capable of generating 16 equally spaced phases of the LO with a step size of 22.5 degrees. As previously discussed, this step size is sufficient for a 4-element phased array system.
The outputs of the VCO have to be provided to the phase selectors in each element in a symmetric fashion as any asymmetry in this distribution leads to an error in the phase-shift causing degradation of the array pattern. Therefore, a symmetric H-tree structure, using the top two thick metal layers, is used to distribute the multiple VCO outputs.
The phase selectors in each element work in two stages. In the first stage, the eight differential outputs of the VCO are provided to three sets of eight differential pairs. The tail current of each differential pair is controlled by a shift register. The first two sets of differential pairs determine the in-phase (I) and quadrature (Q) LO signal while the third set is a dummy to ensure that the VCO buffers see a constant load. By turning on the right differential pair, the VCO differential pair outputs corresponding to desired LO phases for the I and Q can be selected independent of each other. While this provides flexibility to account for phasedistribution and device mismatches, in practice the LO distribution and matching was sufficient to render independent selection unnecessary. The tail current sources of the dummy set of differential pairs is controlled by bits complementary to the LO I and Q phase selection bits to ensure constant loading of the VCO buffers.
The next stage of the phase selectors consists of a set of two differential pairs with tail current sources that are also controlled by the shift registers. The inputs to the differential pairs are antiphase with respect to each other. Thus the output can either be inphase or anti-phase with respect to the input.
Interpolation of the raw 16 phases of the VCO is possible by selecting more than one differential pair in the first stage of the phase selector. (The two-stage phase selection procedure, limits the phase-shifts that can be generated by using this interpolation method). For example, by selecting the 0˚ phase and the 22.5˚ S ig n a l C u r r e n t R e t u r n C u r r e n t R e tu r n C u r r e n t
CSIC 2005
47 0-7803-9250-7/05/$20.00 ©2005 IEEE Digest phase output, the output has a phase of 11.25˚. Selecting more than two differential pairs provides even finer resolution.
Measurement Results
The transmitter was implemented using 0.18µm MOS transistors and occupies an area of 6.8mm x 2.1mm, as shown in Figure 1 .
The image signal of the second up-conversion step, which falls at 14.4GHz, is found to be attenuated by 43dB due to the additional attenuation provided by the tuned stages at RF. The worst-case peak-to-null ratio with all four elements active is 23dB, as seen in Figure 11 . The transmitter is capable of supporting data rates in excess of 500Mbps per each BPSK channel, as shown in Figures 9 and 10.
The on-chip power amplifiers are capable of generating up to +14.5dBm output power at 24GHz with a bandwidth of 3.1GHz. Its output-referred 1dB compression point is +11dBm, as can be seen in Figures 7 and 8 . The PA output power with driver output of 0dBm is 7dBm. The coupling between multiple power amplifiers on the same die is a concern in an integrated phasedarray system. The physical distance (~1 mm) between the power amplifiers and the use of shielded transmission line in the matching networks improve the isolation in this work. In order to measure the isolation between elements, three of the elements were deactivated by switching off all the LO phases in the phaseselectors in those elements. The isolation was determined by comparing the power at the output of the active element with the output power of the three inactive elements. The worst case isolation (i.e., between two adjacent elements) is measured to be 28dB. The isolation between the other elements is better than 35dB. 
