Abstract-A 94-GHz phased-array transceiver IC for frequency modulated continuous wave (FMCW) radar with four transmitters, four receivers, and integrated LO generation has been designed and fabricated in a 130-nm SiGe BiCMOS technology, and integrated into an antenna-in-package module. The transceiver, targeting gesture recognition applications for mobile devices, has been designed using phased-array techniques to reduce the total DC power while still maintaining the required link budget for FMCW operation. The complete array achieves state-of-the-art for W-band per-element power consumption of 106 mW per TX element and 91 mW per RX element, and measurements indicate a per-element output power of 6.4 dBm and single-sideband noise figure of 12.5 dB at 94 GHz. The array is able to achieve a beam steering range of ±20°while maintaining at least 3 dB main lobe to side lobe levels. The complete chip-antenna module has been tested to characterize basic FMCW radar functionality. Initial radar experiments suggest a sub-5-cm range resolution is possible with 3.68 GHz RF sweep bandwidth, which is in line with theoretical predictions.
I. INTRODUCTION
H IGHLY integrated millimeter-wave (mm-wave) transceivers, enabled by advances in CMOS and SiGe BiCMOS process technology over the last decade, have found what is seemingly a perfect niche in automotive radar. With many gigahertz of absolute bandwidth available, and a compact antenna size due to the small wavelength at mm-wave, the W-band matches up well with the requirements for adaptive cruise control and similar technologies [1] .
Manuscript received August 18, 2016 ; revised January 9, 2017; accepted February 13, 2017 The development of mature low-cost SiGe and CMOS technologies with f t and f max of 150 GHz and beyond has brought down the cost of such driver-assist technology and with it, widespread adoption.
More recently, mm-wave radar has also received increasing attention for short-range applications such as gesture recognition, occupancy detection, and remote heart-rate monitoring [2] - [5] . However, existing mm-wave radar solutions intended for automotive use are power hungry and often bulky. These drawbacks pose a problem for mobile power-constrained applications. Toward this goal, in this paper, a compact antenna-in-package frequency modulated continuous wave (FMCW) radar phased-array solution at 94 GHz with record-low per-element power consumption is proposed and demonstrated.
Although some promising progress has been made on gesture recognition radar at 60 GHz [5] , the higher frequency at 94 GHz allows the possibility for larger sweep bandwidths (which improves depth resolution of the radar) and smaller antenna sizes. However, the higher frequency also presents a challenge from the circuit design point of view, which has a negative impact on efficiency and achievable SNR.
A. Phased-Array Techniques
A key means to improve energy efficiency is to leverage phased-array techniques to reduce the total transceiver DC power [6] . For an N-element phased array, transmitter EIRP is increased by a factor of N 2 , since electric and magnetic fields, not power, are summed, and power density is proportional to E × H . Due to reciprocity, for the receiver array, there will also be a benefit of N 2 in conversion gain. 1 Receiver SNR will increase as well, but only proportional to N: since the noise in each receiver element is uncorrelated, 2 the total noise at the output will increase proportional to N, resulting in an SNR increase of N 2 /N = N. Because these systemlevel performance metrics are improved in a phased array compared with the single-element case, it is possible to reduce performance (and correspondingly, DC power) while still meeting system requirements derived from the link budget. Consider an RF power amplifier output stage, designed to operate close to saturation, and optimized to drive a load impedance of Z 0 . If the device sizes in the power amplifier are reduced by half, the power amplifier should be able to achieve the same efficiency at saturation while driving a load impedance of 2Z 0 , and delivering half of the power to that load. This scaled-down power amplifier can be used along with a matching network with an impedance transformation ratio of 2 (or if a matching network is already present, modifying its impedance transformation ratio) to drive the original load impedance of Z 0 , while delivering half of the power at the same efficiency. With a real matching network, there will be some additional losses, so the efficiency and output power will in practice be degraded somewhat. In a phased-array system, this strategy can be used to reduce DC power and per-element performance without sacrificing efficiency.
A similar scaling approach can be used on the receiver side. Consider an LNA designed for power and noise matching to an impedance Z 0 . Because the current density for minimum noise figure is largely invariant of emitter length [7] (or similarly, transistor width in CMOS technologies), the LNA device sizes can be reduced by half, resulting in an LNA with the same N F min matched to an impedance of 2Z 0 . As in the transmitter case, a matching network can be used to match the LNA back to the original Z 0 input impedance. The new LNA has half of the DC power consumption, and slightly higher noise figure due to the added matching network losses. Of course, due to matching network complexity, added losses, and the bandwidth narrowing effect of high-Q matching networks, it is not possible to continue this scaling arbitrarily: architecture or circuit topology changes must then be used to reduce power consumption further.
B. FMCW and Millimeter Wave
Linear FMCW radar is an attractive radar modulation scheme for energy-efficient mm-wave applications for a few main reasons.
1) Constant Envelope Modulation: First, because FMCW is a constant-envelope modulation scheme, transmitter linearity is not a concern. This allows for use of linear power amplifiers close to saturation, or even nonlinear switching power amplifiers, either of which will improve the overall transmitter efficiency. Also, because the modulated LO signal is constant envelope, it can be generated at a low frequency and scaled up to a higher frequency using a nonlinear frequency multiplier, without negative impacts from the nonlinearity of the multiplier. So, in a phased-array system, the modulated LO signal can be generated centrally, scaled in frequency, and routed out to all elements.
2) Simple Modulation: Second, because the modulation is simple and shared across all elements, the frequency modulation can be incorporated into the phase-locked loop (PLL) that is likely present in the system anyway. In most types of radar systems, the distance resolution is inversely proportional to the bandwidth of the radar signal: a signal with larger bandwidth can better resolve two close-together targets [9] . For a pulsed radar system, a high-bandwidth modulation requires fast ON/OFF times to achieve a short pulsewidth [10] , which means the circuit creating the modulation needs to be carefully designed to support that bandwidth. In an FMCW radar system, although large overall bandwidth is needed in the RF transmit and receive chains (as in the pulsed radar case), a large instantaneous bandwidth is not necessarily needed, since it is the overall bandwidth of the sweep itself that determines the resolution. So, a slowly modulated signal can be used, as long as the frequency of the signal varies across the full bandwidth over time.
Several works have demonstrated state-of-the-art synthesizers with integrated frequency modulation using a digital-PLL-based architecture [11] - [14] . The focus of this paper is on energy-efficient array implementation and FMCW radar demonstration, so an external synthesizer is used to generate the frequency-modulated LO waveform. The chip includes a 47-GHz VCO and 32× frequency divider, and the PLL feedback is completed externally using a discrete off-theshelf IC with a phase-frequency detector and charge pump, along with an on-board active loop filter. Most of the power consumption of the PLL is likely to come from the high-speed dividers, so if the PLL were fully integrated, the added power consumption would be fairly small and have little impact on the per-element power. 
C. Proposed System Architecture
As it is critical to minimize TX-RX leakage for an FMCW radar, an architecture with separate TX and RX antennas was selected. Although it is possible to use an integrated isolating coupler to achieve some degree of isolation [15] , [16] , even an ideal coupler will have 3 dB insertion loss due to the power-splitting nature of the coupler. The block diagram of the full 4TX-4RX phased-array transceiver is shown in [ Fig. 1 .] To simplify routing in the antenna-in-package module, a small four-element array size was selected, for both the transmit and receive arrays. LO generation circuitry is shared between the transmit and receive elements, and consists of a VCO, frequency multiplier, and integrated frequency dividers. A PLL was implemented off-chip for LO tuning and to enable FMCW ramp generation. A single combined differential receiver output is fed off-chip.
For phase shifting, LO path phase shifters are used [17] . LO path phase shifting is attractive here because it removes phase shifter degradations such as nonlinearity and noise from the signal path. This increases efficiency because amplifiers on the LO path can be designed to operate close to compression, as the LO signal is constant envelope. Baseband phase shifting is also attractive from a power consumption perspective, but requires two mixers for complex downconversion. This is not necessary for a linear FMCW system; since the TX and RX frequencies are always slightly offset, the mixer strictly speaking is not truly operating as a direct conversion mixer, and therefore power can be saved by only using a single mixer.
II. BLOCK-LEVEL DESIGNS

A. Power Amplifier
To meet link budget requirements, a power amplifier was designed to provide approximately +9 dBm of output power to a single-ended 50-antenna port. A single-ended antenna interface was selected to minimize mm-wave IO count, which keeps the die area small and relaxes routing constraints within the antenna module.
The main PA gain stage is based on a cascode amplifier. Because of the high output impedance of the cascode, it is hard to achieve a good power-added efficiency (PAE) using a cascode output stage and the load-line impedance is significantly different from the small-signal impedance. However, the per-stage gain is still quite high relative to a simple common emitter amplifier, which has high PAE, but low gain when driven close to saturation. The amplifier core uses a differential topology to reduce sensitivity to modeling errors associated with the impedance seen at the cascode node. The bases of the cascode devices in a differential pair can be shorted directly together using local routing only, and therefore present a virtual short circuit in differential mode. In common mode, gain is not a concern, so low-Q bypass capacitors are used to prevent any common-mode stability problems associated with the impedance at the base of the cascode device.
To get both high gain and moderate efficiency, two cascode driver stages are used to drive a common-source output stage (see Fig. 2 ). A minimum supply voltage of 1.8 V is needed to get good cacsode performance, but is slightly above the open-base V C E breakdown voltage of a single device. For the noncascoded output stage, a moderate impedance is provided to the base via the bias network to extend the V C E breakdown range beyond the open-base limit of BV C E O and allow operation from a single 1.8-V PA supply [18] . For additional robustness to V C E breakdown with the 1.8-V supply voltage, a small series emitter degeneration resistor is added at the tail. This helps improve reliability issues and has no impact on gain since it appears only in common mode.
It is difficult to power match at the output of the cascode due to the high real part of the output impedance, on the order of 1 k . Additionally, the real part of the input impedance of the cascode amplifiers is fairly small (tens of ohms), leading to a large required transformation ratio. The available area for matching networks is constrained due to the phased-array element pitch and the internal power-supply flip-chip bumps in between the phased-array lanes, making it impossible to fit a transmission-line based matching network into the small area available for the PA. So, for a moderate impedance transformation ratio given the area constraints, coupling between PA gain stages is best achieved using 2:1 transformers with moderate to low coupling factors.
Because of the low coupling factor, the effective turns ratio is slightly less than 2:1 in practice, reducing the impedance transformation ratio. However, the leakage inductance of the transformer primary can be used to increase the impedance transformation ratio of the transformer, by treating it as an additional inductance in series with the transformer, which acts to increase the impedance seen at the ports of the primary transformer. The output stage does not use neutralization because of the extra capacitive load it would present to the output balun, which is also a 2:1 transformer. The output balun also provides ESD protection to the signal pad, as at low frequencies, it provides a low impedance path to ground for the signal pads through the center tap of the secondary.
A 3-D HFSS model of the full PA, including the output ground-signal-ground (GSG) pads and the three stages of transformers, is shown in Fig. 3 . The individual transformers were first designed separately using HFSS. As a final verification, all three transformers were simulated together along with the output pads. When incorporated back into circuitlevel simulations, this PA-scale EM model predicted nearly identical performance when compared with the simulations using separately modeled transformers.
At the intended carrier frequency of 94 GHz, simulations show a small-signal power gain of 31 dB and a peak PAE of 15% at an output power of 9.6 dBm (Fig. 4) . If the DC power of the phase shifter driving the PA is included in the efficiency calculation, the PAE of the full chain drops to 12%. Because of the high small-signal gain of the cascode amplifiers, the gain starts to compress well before the output stage is fully saturated. As a result, the peak PAE is reached well beyond the P −1 dB of the amplifier.
Large-signal simulations show that the saturated output power and efficiency are relatively flat across frequency ( Fig. 5 ) with P sat above 10 dBm from 85 to 98 GHz, and the peak PAE of the PA is nearly a constant 15% from 85 to 95 GHz. The peak small-signal gain is 30.8 dB at 93 GHz, and the small-signal 3 dB bandwidth is 12 GHz (from 86 to 98 GHz). 
B. Receiver
The number of LNA stages in the receiver was limited to keep DC power consumption low. Because of this, no inductive degeneration was used in the receiver input device (Q1 in Fig. 6 ), as this would reduce the gain of the LNA and increase the noise contribution of the mixer to the overall receiver noise figure. Effectively, this means the input of the LNA is designed for a power match, rather than a noise match. This adds a small amount (0.3 dB) to the overall noise figure but limits the mixer noise figure contribution [see Fig. 10(b) ].
Since the current density for minimum noise figure is typically six to ten times smaller than the current density for peak f T [7] , biasing for N F min does have a gain penalty. As a compromise between gain and noise figure, the LNA input stage is biased at about half of the current density for peak f T . This leads to an increase of 0.5 dB above the estimated minimum noise figure of 3.2 dB, but also allows the device to operate at nearly peak f T .
The input matching network uses a series inductor, a DC blocking capacitor, and a quarter-wavelength transmission line shunted to ground. Nearly all of the impedance matching is provided via the series base inductor. Because of the series inductor's capacitance to ground, it acts more like a transmission line than a simple series inductor. So, on a Smith chart, this looks like a rotation, rather than moving on a line of constant resistance [ Fig. 7(a) ]. The DC-block capacitor is a small series impedance at RF and is only needed to separate the bias points at the signal pad and LNA input [ Fig. 7(b) ]. The shunt transmission line provides a low-impedance path from the pad to ground at low frequencies for ESD robustness, and contributes a small amount of shunt inductance at RF [ Fig. 7(c)c), Fig 8] .
The first LNA stage has an inductor load designed to resonate with its output capacitance, and is AC-coupled to the input of the second stage (Q2). The AC coupling capacitor between the first and second stages is implemented using the standard MIM capacitor offered in the process. Its value is large so that it contributes a negligible series reactance, reducing design sensitivity to the modeling accuracy of the capacitor. The second LNA stage connects to the differential mixer input using a transformer, which provides single-ended to differential conversion.
The mixer itself is a double-balanced switching core (Q3-Q6), with RF signals coupled in at the emitters, and LO signals at the bases. Because headroom is limited, the mixer is implemented as a pseudodifferential rather than differential pair, and there is no RF transconductor device in the mixer stack. The limited headroom also makes it difficult to use a high-impedance active load that is sufficiently saturated. A resistive load is used instead, largely to set the bias point, and a transimpedance amplifier (TIA) is used to provide a low impedance at the mixer baseband output. Using this topology, we are able to improve the voltage gain of the mixer, while the headroom is constrained. Additionally, the resistive load has improved noise performance over the active load, and suffers from less capacitive parasitics: the PMOS f t is low relative to that of the NPN devices, and no high-speed PNP is available in this technology. Gain control is achieved by varying the feedback resistance in the baseband TIA via switched resistor segments. The TIA itself consists of a high-speed op-amp using SiGe NPN devices. Simulations show a receiver conversion gain of 25-38 dB and a noise figure ranging from 11.1 to 11.3 dB (single sideband) at 94 GHz, depending on gain control settings (Fig. 9) . Since the gain control is implemented at the IF amplification stage, it has little impact on noise figure because of the frontend gain in the preceding stages. As can be seen in Fig. 10(a) , approximately 54% of the total noise at the receiver output is due to the LNA, 30% from the mixer, 6% from the baseband amplification, and 12% from the reference noise of the input port.
C. LO Generation and Distribution
The LO generation circuitry includes an integrated VCO, frequency multiplier, frequency dividers, and LO buffers. The integrated VCO is designed to operate at half of the RF frequency, and is buffered and sent to a frequency doubler to generate the RF carrier waveform. The frequency-doubled LO waveform is buffered and distributed to the phased-array elements.
There is a routing penalty in distributing the LO signal after the frequency multiplier, rather than before, since the absolute losses per millimeter are worse at higher frequency. However, it can be more efficient overall to accept those losses, since the alternative is to have equally many power-hungry frequency multipliers as there are phased-array elements. Since frequency multipliers typically have poor (if any) conversion gain, low output power, and low efficiency, it makes sense to have as few as possible, as long as a moderately efficient LO buffer can be placed afterward to overcome the LO routing and distribution losses. This approach does not work as the multiplier output frequency approaches the f max of the technology, since it becomes impossible to get any more power gain, but this is not the case at 94 GHz.
The integrated frequency divider chain has five cascaded divide-by-two stages for a 32× total division, resulting a nominal output frequency of 1.46875 GHz if the VCO is operated at 47 GHz. A discrete fractional-N PLL chip is used along with an active loop filter on the test PCB to complete the LO chain externally. 
1) 47 GHz VCO:
The VCO core is a capacitively crosscoupled NPN pair. Rather than using a tail current source to bias the VCO, a series tail resistor is used instead. This greatly improves the simulated 1/ f 3 phase noise corner.
2) Frequency Dividers: The first few stages are bipolarbased static CML dividers for robust high-speed performance. No inductive peaking was used for the high-speed bipolar dividers. The last two stages are CML dividers that use 130 nm CMOS devices and consume much less power. A final NPN buffer amplifier is used to drive the LO signal off chip.
3) Frequency Doubler: The frequency doubler uses a pushpush topology with an inductive load [19] . A common-mode tail resistor is used instead of a current source, as simulations showed it provided slight enhancement of the second harmonic output current. Simulations also indicated commonmode stability problems when using a tail current source in the frequency doubler, related to the high-Q capacitance that it presents at the tail node, which resonates with the commonmode inductance of L 2 . The common-mode stability issues are ameliorated by using the tail resistor (R2 in Fig. 11 ) since it has significantly less capacitance. At typical operating conditions, simulations show a conversion loss of 6 dB with an input power of −9.5 dBm at 47 GHz and DC power consumption of 7.7 mW, resulting in a drain efficiency of 0.36%.
4) LO Distribution Amplifiers:
After the frequency doubler, the 94 GHz LO waveform must be distributed to the TX and RX phased-array elements. Separate distribution networks are used for the TX and RX elements, so that the power levels can be separately controlled. Even a lossless LO power-splitting network will inherently represent a reduction in power level, since the input power is divided equally to all output paths. If LO buffers are used after the LO splitting network, the efficiency of any LO buffers will be quite poor, since the signal level will be very small due to the splitting loss. To avoid suffering that efficiency penalty, moderatepower LO buffers are used drive the input of the power splitting network. This results in the same power levels at the output, but a higher efficiency and reduced overall DC power.
The LO distribution amplifiers were designed by reusing the first PA gain stage (for both amplifier stages of the LO buffers) and redesigning the interstage matching networks. The output matching network is a transformer balun, to drive the singleended LO distribution network.
5) Lumped-Element Power Divider Network:
To simplify the design of the divider network, two nested 1:2 power splitters are used [ Fig. 12 ]. The 1:2 splitter uses quarter-wavelength Z 0 = 70.7 lines to enable use in a cascade; when terminated with 50 loads, the input impedance of the splitter is also 50 . Typically, a Wilkinson power splitter is used as a 1:2 power splitter at millimeter wave [20] . The Wilkinson splitter is an isolating power divider, which will prevent potential crosstalk between elements. A differential-mode termination resistor is needed to provide this isolation, but requires that the outputs of the Wilkinson are physically close. Because the inputs of the phased-array channels are spaced apart by the array element pitch (300 μm), additional routing is required to distribute the Wilkinson outputs to the phased-array elements [ Fig. 13(a) ], which requires additional area and increases losses.
If a nonisolating network is used, it can also provide the required routing to the input of each array element [ Fig. 13(b) ]. Since the LO distribution network is terminated in the passive quadrature hybrid load whose input impedance is constant versus phase code, there is no opportunity for crosstalk between elements even though a nonisolating splitter is used.
However, the splitter must fit within the array element pitch of 300 μm, but the length of a single quarter-wavelength line on-chip is approximately 400 μm. Clearly, a straight transmission line will not be able to fit at this array element pitch-the transmission line would need to be meandered to fit. Instead of using a meandered transmission line, the splitter instead uses a lumped-element artificial transmission line to reduce the area of the network [ Fig. 13(c) ].
The insertion loss for an ideal 1-4 splitter would be 6 dB-for this design, EM simulations show excess insertion losses of 7.1-7.4 dB (1.1-1.4 dB higher than an ideal power splitter) from 80 to 100 GHz. The simulated amplitude mismatch is less than 0.04 dB, with the outer ports receiving slightly more power than the inner ports, and less than 0.4°phase mismatch up to 100 GHz.
D. Phase Shifter
Passive reflection type phase shifters are an attractive option for low-power design, as they do not consume any DC power and multiple stages can be cascaded to achieve the desired phase-steering range [21] , [22] . However, this also implies the insertion loss trades off with phase shift range. To overcome this loss, an additional amplifier stage is needed, which consumes DC power.
Instead, in this paper, a Cartesian architecture is used to ensure a full 360 • of phase shift. Weighted combinations of the I and Q LO waveforms are current-combined at the phase shifter output, meaning that the phase should be the same regardless of process variations (provided the input quadrature matching is sufficiently accurate). The VGA functionality is achieved by current steering using the cascode devices (Q5-Q8) rather than the g m devices of the Gilbert cell. This ensures a more constant input impedance versus code. As the hybrid is single ended, not differential, a balun stage is needed to drive the phase shifter LO inputs. A single-ended cascode amplifier with transformer load provides single ended to differential conversion, and further isolates the hybrid from any variations in phase shifter input impedance.
1) Quadrature Coupler:
The quadrature coupler is designed based on the approach in [23] to provide good phase accuracy over a broad bandwidth. It consists of three high-impedance transmission lines connected in parallel at each end with MIM capacitors. The complete structure, including MIM capacitors, was EM simulated to verify the final design [ Fig.  16 ]. Simulations show a quadrature phase accuracy within 5 • of 90 • from 87 to 105 GHz [ Fig. 17 ]. The amplitude imbalance is within ±1 dB from 85 to 102 GHz. 
2) Phase Interpolator Simulated Results:
The phase shifter and transmitter power amplifier were simulated together to characterize the effects of gain compression on the phase shifter resolution. The phase shifter control voltages are driven by differential voltages on the I and Q input terminals. Each I and Q voltage is controlled by a 4-b DAC, so there are 256 possible codes that can be used in the phase shifter. In practice, only the largest amplitude codes will be used. The constellation of available gain and phase combinations at the PA output is shown in Fig. 14(a) . Gain and phase errors are then computed for each possible desired phase angle. The worst case phase error is about 4.5°, and the worst-case amplitude error is 0.55 dB, which both occur when both the differential I and Q voltages are at their largest (45°phase shift). This is because at the highest/lowest DAC codes, the differential pairs (Q5-Q8 in Fig. 15 ) are no longer in the linear range.
III. ANTENNA MODULE
The die was flip-chip packaged using stud bumps onto an antenna module fabricated using an organic HDI substrate [24] . The antenna module is based on a standard BGA footprint and low-frequency connections fan out to BGA balls for integration onto a larger test PCB (Fig. 18 ). The substrate, 1.2 cm × 1.2 cm, contains two linear arrays of four patches each-one array for the transmitter elements and one array for the receiver elements [ Fig. 20 ]. For high TX to RX isolation, the TX and RX antenna arrays are located on opposite sides of the module. The antennas themselves are aperture-coupled patch antennas with linear polarization. To reduce elementto-element coupling, the antenna spacing within each array is set to 0.8λ at 94 GHz. The spacing between antennas was chosen as a compromise between best isolation, cavity dimensions (to avoid substrate modes), and the desired peak gain of 12 dBi. Isolation between elements was constrained to be at least 25dB; the isolation increases with the antenna pitch. Conversely, the usable beam steering range decreases with the antenna pitch, due to the presence of grating lobes.
Because the antenna spacing is greater than half-wavelength, grating lobes will appear for large beam steering angles. HFSS simulations of the antenna array predict that a beam steering range of ±27°is possible for a grating lobe level of 3 dB below the main lobe (Fig. 19) . At a simulated beam steering angle of ±34°, the grating lobe level is the same as that of the main lobe. The simulated 3 dB beam width of the main beam is about 16°in the E-plane (XY plane in Fig. 21 ) and 90°in the H-plane (X Z plane in Fig. 21 ). HFSS simulations of the package show a TX-RX isolation of 60 dB from 90 to 98 GHz [ Fig. 19(c) ], which is sufficient given the input linearity simulations of the receiver.
IV. MEASURED RESULTS
The 3.7 × 2.2 mm chip was fabricated in a 130 nm SiGe BiCMOS process (Fig. 22) . The TX and RX arrays are on opposite sides of the chip. At the TX and RX arrays, the GSG mm-wave IO pads are placed vertically running up the sides of the chip. Adjacent ground pads are shared between elements to reduce die size. Shared LO generation circuitry is at the center of the chip and feeds in to the power divider networks, which in turn feed directly in to the phase shifters for the transmitters and receivers. The chip was tested both using mmwave probes in a chip-on-board configuration, and using the packaged antenna module for wireless measurements.
A. Probe Station Measurements
1) LO:
The measured VCO tuning range is 11%, from 44 to 49 GHz [ Fig. 23(a) ]. The VCO center frequency is about 5% lower than simulated, but due to the large designed tuning range, it still covers the desired center frequency of 47 GHz. The PLL locking range is from 89 to 95 GHz, slightly reduced from the VCO tuning range. It is limited by k VCO at low tuning voltages, and by the output swing of the active loop filter at high tuning voltages. At 94 GHz, the closedloop phase noise is −76 dBc/Hz at 1 MHz offset, measured at the PA output [ Fig. 23(b) .
2) Transmitter: Probe station measurements were used to characterize the output power of individual PAs, along with a W-band power meter. At 94 GHz, the output power varies over a range of about 0.5 dB across elements, from 6.3 to 6.8 dBm (Fig. 24) . The outer TX elements (1 and 4) have the highest output power, and the inner elements have lower output power. The output power was lower than the simulated P sat = +10.5 dBm of the PA, as well as the simulated +9 dBm output power with expected LO signal levels. The decrease in output power for inner TX elements suggests IR drop in the supply network may be a partial cause. Additionally, the output power decreases beyond 90 GHz, whereas in simulations, it decreased after about 94 GHz. Unfortunately, due to the integrated LO chain and limited VCO tuning range, it is not possible to measure outside of the 89-95 GHz frequency range, so it cannot be verified if the output power increases further at lower frequencies. The decreased output power, and trend of power versus PA element, was consistently observed across multiple chips. The measured PA output impedance matches closely with simulation [ Fig. 25(b) ]. The average output power of 6.5 dBm results in a transmitter efficiency of 6.3% (P TX,RF /P TX,DC )
3) Receiver: The receiver noise figure was measured with the hot/cold method, using an Agilent N8974A noise figure meter and W-band WR-10 noise source, connected to the receiver input via a W-band GSG probe (Fig. 26) . In the RX case, element-to-element mismatch in noise figure is quite small. We do not see a consistent trend of increased noise figure in the interior elements, which makes sense because noise figure should not be as sensitive to IR drop. We also see that the RX noise figure is improving versus frequency, unlike in the TX case where the best performance was at lower frequency. The measured 12.5 dB (SSB) noise figure at 94 GHz is higher than the simulated 11.1 dB (SSB) at 94 GHz. Below 94 GHz, the noise figure increases quite drastically. This is potentially due to insufficient LO power at the RX mixer, as the LO distribution network is tuned slightly high due to EM modeling error and its output power drops off below 93 GHz. The measured LNA impedance matches somewhat closely with simulation, although detuned slightly [ Fig. 25(a) ]. The measured P −1dB of the receiver is −19 dBm at 94 GHz. 
B. Packaged Measurements 1) Array Characterization:
To verify phased-array operation, the TX radiation pattern was measured at various beam steering angles. First, the TX radiation pattern was characterized in the lab at BWRC, by using a W-band power meter head (connected to a horn antenna) manually placed at different angles relative to the broadside of the array. Because the measurement was done manually, the measurement points are in 5°increments, over a range of −60°to 60°from the broadside of the array. The RX conversion gain pattern was also characterized using a similar approach, but instead using a 94 GHz source and horn antenna placed at various angles, instead of the power meter. However, this improvised setup led to very slow measurements, required manual intervention to move the power meter or source from angle to angle, and was only capable of doing measurements along a single cut plane.
An indirect phase shifter characterization was completed based on beamforming and power-combining measurements at different phase shifter codes (Fig. 27) . Because the PA is not fully driven into saturation, amplitude error is not suppressed. The measured peak phase and amplitude error are 9 • and 1.4 dB, respectively.
The TX radiation pattern was also characterized at the University of Nice Sophia-Antipolis, using the measurement setup first described in [25] and extended to 90-140 GHz in [26] . The digitally controllable arm can move along both φ and θ angles, so it is possible to capture nearly a full hemisphere of the radiation pattern with single-degree-level steps. An F-band subharmonic mixer is placed on the end of the moveable arm and the IF output connected to a spectrum analyzer. The power level is measured by observing the peak level of the downconverted signal level on the spectrum analyzer. Due to the large size of the F-band signal source, it was not possible to measure the RX beam steering pattern. However, the results should be similar as the antenna arrays and phase shifters are identical for both the TX and RX.
The measured 3-D radiation pattern for various beam steering angles is plotted in Fig. 28 . Similar to the HFSS simulations of the array, there are significant side lobes at the 30°beam steering angle. The beamforming measurements show a steering range of about ±20°while maintaining 3 dB main lobe to grating lobe levels. The beam dropoff is about 2-3 dB at +20°and 4 dB at −20° (Fig. 29) . This is slightly worse than predicted by HFSS simulations of the antenna array, which showed only 1-1.5 dB of beam dropoff. Measurements along the E-plane of the array show little variation in radiation pattern versus beam steering angle (Fig. 30) .
2) Radar Measurements and Characterization: As a basic demonstration of radar capability, a simple experiment was performed using a metal reflector at different distances. A triangular frequency modulation was applied to the reference signal of the PLL, with both 4% and 2% sweep bandwidths. At 4% sweep bandwidth (3.68 GHz RF bandwidth), the fast Fourier transform of the IF waveform shows distinct peaks in different range bins when the object is moved by 5 cm [ Fig. 31(a) ]. At 2% sweep bandwidth (1.86 GHz RF bandwidth; for the 2% sweep, the center frequency was retuned slightly higher), the peaks are still present, but in some cases, the reflected signal occupies multiple range bins [ Fig. 31(b) ]. According to the range resolution equation, there should be a range resolution of (c/2B) = 4.1 cm in the 4% sweep case and 8.1 cm in the 2% sweep case. This is consistent with the measured results of the radar experiment.
V. CONCLUSION
A comparison of this paper with state-of-the-art shows that this transceiver has achieved the lowest per-element DC power, including LO overhead power, while maintaining comparable per-element performance and demonstrating a high level of integration (Table I) . Further reduction in DC power could easily be achieved by using injection-locked frequency dividers instead of static BJT dividers (Fig. 32) .
In summary, a compact highly integrated FMCW radar phased-array transceiver module has been demonstrated. The integrated antenna-in-package allows for a small form factor of 1.2 cm × 1.2 cm for the complete module, including all TX and RX antennas. Beam steering has been demonstrated across a ±20°range, and FMCW experiments show ranging functionality in line with theoretical resolution limits.
Andrew Townley (S'11) received During his Ph.D. work, he was with the Electronique pour Objets Communicants (EpOC) laboratory and at STMicroelectronics, Crolles, France. He is currently a Post-Doctoral Researcher at the EpOC Laboratory. He has authored or co-authored nine publications in journals and 19 publications in international conferences. His current research interests include millimeter-wave communications, especially in the field of the design and measurement of antenna in package, lens, and reflector antennas for the 60-, 80-, and 120-GHz frequency bands. From 2010 to 2015, he was with STMicroelectronics, Crolles, France, where he was involved in the development of integrated antennas, high performance passive components in advanced bulk and SOI RF CMOS technologies, and the millimeter wave antenna design and packaging technology development. He is currently with e2v semiconductors, Grenoble, France, where he is an Application Engineer, focusing on broadband data converters (ADCs and DACs).
