planar metal wires. Therefore, technologies that can provide efficient single-hop links for on-chip global communications are crucial for the design and development of future processors comprising hundreds of cores.
Various revolutionary approaches that can provide single-hop global links have been explored: on-chip photonic interconnects [4] have the advantage of high throughput and low transmission loss. However, they must overcome technological and manufacturing challenges to become viable for mass production. RF-interconnects (RF-I) [5] , [6] can be implemented using existing CMOS technology, but they require long on-chip transmission lines that lead to routing difficulty and large area overhead. 3-D integrated circuits (ICs) [7] rely on close proximity and precise alignment, which create issues with yield and heat dissipation.
The wireless network-on-chip (WiNoC) architecture [8] , an example of which is illustrated in Fig. 1 , is a feasible option not only because of its compatibility with CMOS technology, but also due to its relaxation of implementation restrictions as compared to RF-I or 3-D ICs. On top of the conventional wireline NoC structure, it uses modulated RF carriers to establish single-hop wireless links for distant cores within a processor chip, hence the typical communication range is usually less than 20 mm [8] . In such a short range, millimeter-wave (mm-wave) transceivers with on-off keying (OOK) modulation are usually preferable [9] [10] [11] [12] [13] [14] [15] since they have the capability to reach data rates of tens of gigabits/second at a lower power consumption, meeting the requirement of on-chip interconnections. A 60-GHz transceiver system in 90-nm CMOS with 2.5-Gb/s data rate over 40-mm distance was illustrated in [9] , which has an energy efficiency of 114 pJ/bit. The authors of [13] improved the efficiency to 6.4 pJ/bit, achieving 11 Gb/s in a 14-mm range. In [14] , using double-carrier transmission, the data rate was increased to 20 Gb/s, but only within 5 mm of distance. The transceiver in [15] enhanced not only the distance to 100 mm, but also the efficiency to 6.26 pJ/bit, with a data rate of 10.7 Gb/s. However, it requires an on-board Yagi-Uda antenna on a nonsilicon substrate, which presents integration challenges. Therefore, an energy-efficient mm-wave transceiver suitable for the WiNoC application is yet to be demonstrated.
In this work, we introduce a highly efficient 60-GHz OOK transmitter (TX) tailored for the requirements of the WiNoC. Specifically, major design efforts are made to reduce the power consumption as well as the area overhead of the TX circuits while maintaining a high data rate: a transformer neutralization technique implemented in the drive amplifier (DA) design not only results in more than 20% of peak efficiency, but also decreases its footprint by combining inductors to create transformers. For the OOK modulator, the use of a bulk-driven technique [16] avoids stacking of transistors, and hence, reduces the power consumption. In addition, high-speed OOK modulation is made possible by the proposed dual feedthrough cancelling technique, which significantly improves the on-off ratio with a small power consumption penalty. By using direct transformer coupling from the voltage-controlled oscillator (VCO) to the modulator, dc-blocking capacitors, and local oscillator (LO) buffers are no longer required, resulting in savings in both area and power. This paper is organized as follows. The system-level architecture of the WiNoC and design specifications of the proposed mm-wave TX are described in Section II. Section III provides detailed circuit design methodologies for the TX. Measurement configuration and experimental results are presented in Section IV. Finally, Section V provides concluding remarks.
II. SYSTEM ARCHITECTURE AND DESIGN SPECIFICATIONS

A. WiNoC Architecture
It is possible to design high-performance, robust, and energy efficient multi-core chips by adopting novel architectures inspired by complex network theory in conjunction with the on-chip wireless links. One of the approaches to improve the NoC performance is to design a hierarchical architecture [8] , as shown in Fig. 1 . The whole system can be partitioned into multiple small clusters of neighboring cores called subnets. Instead of a single NoC spanning the entire system, as is traditional, there will be subnets with varying NoC architectures for different parts of the chip [17] . Each subnet has a centrally located hub connected to all the cores. For inter-subnet data exchange, the hubs from all subnets are connected in a second-level network, forming a hierarchical structure. To reduce overheads, only a few widely separated and/or more frequently communicating hubs are equipped with the proposed mm-wave OOK wireless transceiver. Typically, a simulated-annealing (SA)-based methodology is used for placing wireless transceivers into the hubs [8] , [18] . Therefore, on top of the wireline network formed by these hubs, several high data-rate wireless links are established to provide shortcuts for long-distance global communications. These long-range wireless shortcuts in the upper level of the hierarchy give rise to the small-world effect [8] .
Since the central hub aggregates data from a subnet of cores, the wireless transceiver needs to accommodate very high data rates. Accordingly, we set the maximum target data rate to 16 Gb/s. Fig. 2 shows the block diagram of the OOK transceiver required for the WiNoC. The main advantage of the OOK modulation scheme is that it does not require frequency/phase synchronization at the receiver (RX). Thus, a square-law-based highspeed envelope detector can be used in the RX instead of a power-consuming phase-locked loop (PLL) and mixer. The TX front-end presented in this work comprises a 60-GHz VCO, an OOK modulator, and a DA. The 60-GHz carrier generated by the VCO gets modulated by the baseband (BB) data at the OOK modulator. The modulated wideband RF signal is then amplified by the DA, and transmitted through the on-chip antenna. Compared with the earlier work in [19] which employed a switching-amplifier type of modulator, the proposed standalone modulator introduces much lower input capacitance at the BB input, thus eliminating the need for a power-hungry BB buffer at the TX. Note that the BB pulse-shaping filters shown in Fig. 2 are optional. The use of the root-raised cosine (RRC) pulse shaping at BB can decrease the signal bandwidth, which leads to the reduction of TX power by 5 dB [20] . However, the filter itself has to consume low enough power to be considered valuable.
B. TX Architecture
C. TX Design Specifications
Based on the proposed OOK transceiver architecture and a target transmission distance of 20 mm, the link budget is analyzed as follows. For noncoherent OOK demodulation, the bit error rate (BER) can be expressed as (1) in which is the average bit energy and is the noise power spectral density under additive white Gaussian noise [21] . For OOK modulation, the ratio of bit energy to the noise power is directly translated to the signal-to-noise ratio (SNR) since each bit represents the symbol itself. Hence, for a target BER of 10 , which is the BER of traditional wireline links [22] , the calculated SNR is approximately 18 dB. Accordingly, the required TX output power is Noise Floor (2) in which dB is the path loss including the channel propagation loss at 20 mm of distance as well as the antenna loss [8] . In addition, the RX noise floor can be calculated as Noise Floor (3) where is the Boltzmann constant, is the absolute temperature, is the bandwidth, which is set to be 32 GHz, considering the main-lobe bandwidth of OOK modulation with 16-Gb/s data rate, and is the noise figure (NF) of the RX. Assuming an RX NF of 10 dB, the required TX output power is calculated to be 14 dBm. System-level simulations were also carried out in [20] and [8] . It was demonstrated that the limited bandwidth of antenna and transceiver circuits gives rise to inter-symbol interference (ISI), which increases the required TX power to 0.5 dBm for achieving 10 of BER [8] . The on-off ratio of the OOK modulator also affects the system BER. This ratio is defined by (4) where and are the TX output voltage amplitude at the "on" and "off" states, respectively. Ideally, should be infinite when . However, any leakage in the modulator circuit can result in a nonzero amplitude at the "off" state, and the TX output power must be increased in order to maintain the same BER. Similar to the relation between the extinction ratio and power penalty in optical communications [23] , the required amount of increase in average output voltage amplitude, , can be calculated as (5) in which is the average output voltage amplitude. Converting the voltage to power ratio, we can define the TX output power penalty as (6) Fig. 3 illustrates the TX output power penalty as a function of the on-off ratio in a log scale. It can be seen that for less than 1 dB of TX power penalty, the on-off ratio needs to be maintained above 25 dB. The resulting design specifications for the proposed TX are summarized in Table I . Note that the 3-dB bandwidth of the DA is set to 16 GHz for a reasonable tradeoff between the power consumption and ISI created [24] . This has also been verified in previous mm-wave transceiver design in [13] .
III. TX CIRCUIT DESIGN
A. Drain-to-Gate Neutralized DA
As one of the most power-hungry blocks, the efficiency of the DA directly impacts the overall efficiency of the OOK TX. Hence, a common-source (CS) topology is usually preferred since it requires a lower supply voltage for the same voltage headroom. However, in a CS amplifier, the drain-to-gate parasitic capacitor introduces a direct signal path that connects the output to the input. This signal path is considered detrimental to the performance of the amplifier [25] -as it not only reduces the gain by introducing negative feedback, but also degrades the reverse isolation, which can potentially lead to instability. In addition, the presence of allows that any change in the impedance at the output (drain) could also affect the impedance at the input (gate). This significantly complicates the design of the input and output matching networks.
Unilateralization techniques such as the cascode topology are effective ways to decrease the reverse signal flow through [25] . However, the stacking of transistors in cascode circuits reduces the voltage headroom. Hence, such an approach is not suitable for low-voltage low-power design for WiNoCs. Neutralization is an alternative method that creates an additional signal path to neutralize the signal flow through . Previously published neutralization techniques include the differential cross-coupled capacitor method [25] and the drain-to-source transformer-feedback neutralization topology [26] . The former induces positive feedback in the circuit, and thus could result in instability, for example, due to process variations. The latter introduces a source-degeneration inductor, which degrades the gain of the amplifier due to negative feedback.
The two-stage DA proposed in this work uses a drain-to-gate transformer-feedback neutralization technique [27] , as depicted in Fig. 4(a) . The gate and drain inductors are implemented on top and second-top metal layers, respectively, and are combined to form a transformer, with an adjustable proximity to change the coupling factor. To allow for more precise neutralization, in this work, we developed a mathematical design methodology to determine the required transformer parameters.
The -parameter model for a transformer is shown in Fig. 5(a) . The -parameters can be expressed as (7) (8) in which and are the primary and secondary inductances, respectively, and and are the turn ratio and coupling coefficient of the transformer, respectively [28] . Using this -parameter model, a small-signal equivalent circuit of a single-stage CS amplifier with drain-to-gate neutralization is illustrated in Fig. 5(b) . represents the output impedance of the preceding stage before the amplifier. and are the impedances of parasitic capacitances and , respectively. models the load impedance, which includes the drain impedance of itself, as well as the input impedance of the following stage.
To investigate the reverse signal flow from the output to the input, we consider as an independent voltage source [26] . Hence, the two current sources and two impedances connecting the output and ground have no effect on the reverse signal flow and can thus be eliminated in the analysis. Therefore, effectively, there are two reverse signal paths in the circuit; one flows through the drain-gate impedance , whose transfer function can be expressed as (9) The other path is through the coupling of the transformer, whose transfer function is given by (10) For perfect neutralization, these two signal paths need to fully cancel each other. Hence, by setting , the condition of neutralization is derived as (11) The value of solely depends on the inductances of and , which are usually determined by impedance matching requirements at desired operating frequency. Moreover, the capacitance ratio varies with the size and bias point of the nMOS, and could be obtained from the transistor model. Accordingly, the optimum coupling coefficient value can be calculated based on (11) for neutralization. Notice that this condition is frequency independent, indicating that the proposed neutralization technique is effective over a wide bandwidth, limited by the bandwidth of the transformer.
To verify the neutralization condition derived above, , which represents the reverse transmission of an amplifier, is also calculated from the small-signal model in Fig. 5(b) . It can be written as (12) Setting results in the same neutralization condition as in (11) . Fig. 6 plots of each stage of the DA calculated from (12) at 60 GHz, along with the simulated results from Cadence Spectre. Ideal transformers with variable coupling coefficients are used in these simulations. It can be observed from Fig. 6 that the lowest is achieved at and for stage 1 and 2, respectively, both of which are in good agreement with calculations. Note that more than 30 dB of improvement in is achieved by using the proposed neutralization technique.
With their parameters determined, the transformers are implemented and simulated using the Agilent Advance Design System (ADS) Momentum. In the initial design stage, ASITIC [29] is also used to estimate the dimensions of the inductors. The simulated parameters of the transformer for the first stage of the DA are shown in Fig. 4(b) . Moreover, to meet the requirement of 0.5-dBm output power, the transistor widths are selected to be 20 and 30 m for the first and second stages, respectively, as shown in Fig. 4(a) . For a maximum , both transistors are biased at 0.2 mA m of current density [30] . In addition, capacitors and were tuned for proper impedance matching. Simulation suggests that the DA has a peak gain of 9.2 dB at 60 GHz, and a 3-dB bandwidth from 52 to 68 GHz. At an output 1-dB compression point ( ) of 1 dBm at 60 GHz, the DA achieves a power-added efficiency (PAE) of 10%, while consuming 10.4 mW from a 1-V supply. In order to evaluate the ISI effect from the bandwidth of the DA, 16-Gb/s data modulated on a 60-GHz carrier is fed into the input, and both the input and output waveforms are plotted in Fig. 7 . It shows that, because of the limited DA bandwidth, the shortest "1" bit has a 1-dB droop compared to the amplitude at a continuous "1" bit duration. This droop can be mitigated by extending the DA bandwidth, but at the cost of higher power consumption. In addition, as mentioned in Section II, this effect has been taken into account in the system-level simulation.
B. Bulk-Driven OOK Modulator With Dual Feedthrough Cancellation Technique
For WiNoC application, the OOK modulator needs to operate at a high data rate of 16 Gb/s while consuming low dc power. As analyzed in Section II, an on-off ratio of better than 25 dB is also essential to avoid penalty on TX output power. Previously reported OOK modulators can achieve as high as 20 Gb/s of data rate, but at the cost of very high power consumption [31] . In addition, passive modulators such as [34] are also not considered since they require larger LO swing to overcome their loss, which transfers the power burden to the VCO side.
In order to meet these requirements, a bulk-driven OOK modulator with a dual feedthrough cancellation technique is proposed. As shown in Fig. 8 , the main switching pair formed and is driven by the differential LO signal that feeds into the gate terminals. The LO signal is coupled from the VCO through a 1:1 transformer . The primary coil of , which is implemented with the thick top metal layer, is also used as the tank inductor for the VCO. This reduces not only the power consumption, but also the overall footprint of the TX since neither buffers, nor coupling capacitors would be needed for the LO signal. Another 2:1 transformer is also implemented not only to convert the differential RF output to a single-ended signal, but also for the purpose of matching to the input impedance of the DA. The BB signal is fed into the bulk terminals of and through a dc-blocking capacitor . It is well known that the threshold voltage of an nMOS transistor can be expressed as (13) where is the voltage between source and bulk (body) terminals, is the threshold voltage when , is a process-dependent parameter, and is the bulk Fermi potential [32] . With the gate bias voltage of set to slightly below the , the BB signal can therefore turn on/off the main switching pair by modulating the threshold voltages of the nMOS transistors, generating the OOK-modulated RF signal. Comparing with a conventional modulator that uses the Gilbert-cell topology [9] , [31] , the proposed bulk-driven modulator eliminates the stacking of nMOS, which helps reduce the power supply voltage. Another benefit is the relatively low capacitance at the BB input, as the BB signal drives the bulk of the main switching pair, rather than the gate capacitance of the current source [9] , or the large bypass capacitors providing RF ground [15] , [19] . Accordingly, large power-consuming buffers, which can dissipate up to 0.9 mW per Gb/s [15] , are not required to drive the modulator. Note that the bulk terminals are biased with a 0-V dc so that the 0-to 1-V BB signal appears as 0.5 to 0.5-V swing at the bulks. This guarantees the bulk-source and bulk-drain junctions are never forward biased.
Due to subthreshold current, and cannot be fully turned off even when the BB data is "0." Therefore, a reversed switching pair comprising and is introduced. As shown in Fig. 8 , their gate nodes are connected to the same LO signal as in the main switching pair, whereas the drains are cross-coupled. The bulk terminals of are controlled by an inverted and level-shifted BB signal.
acts as an inverter, and the resistor ladder and sets the required output voltage swing from 0 to 0.5 V. Note that this level-shifting inverter can be replaced by a normal CMOS inverter if 0.5-V supply voltage is available. The operation of the switching pairs is illustrated in Fig. 9 . At the "off" state of the OOK modulator, are turned off, but their subthreshold current still flows through the load, which degrades the "off" state isolation. By turning on the reversed switching pair , this subthreshold current is cancelled by the drain current of . The size of and is scaled down to accurately mitigate the subthreshold current. Previous work in [31] used an always-off transistor pair with the same size as the main switching pair to provide such cancellation. This approach not only degrades the gain at the "on" state, since the desired signal is partially cancelled, but also increases the overall power consumption. In this design, as shown in Fig. 9 , are turned off when are on. Due to the small size of , they have negligible effect on the "on"-state gain.
There is another feedthrough path from LO to RF created by the gate-to-drain parasitic capacitance of . The transfer function from this feedthrough path can be expressed as (14) in which is the gate-to-drain parasitic capacitance of , and is the total output load impedance at the "off" state. It can be observed that this feedthrough signal has a different phase than that of the subthreshold current. Accordingly, a cross-coupled capacitor pair and are used for additional feedthrough cancellation. In contrast to the capacitor cross-coupled neutralization technique used in amplifiers [25] , aim to cancel the feed-forward rather than feedback signal through . In order to fully mitigate the feedthrough, the value of is chosen by (15) where is the gate-to-drain capacitance of . The resulting is only 10 fF, which is implemented by a metal-insulator-metal (MIM) capacitor with overlapping top and second-top metal layers. The thickness of the dielectric between the two metal layers is 0.74 m. Electromagnetic (EM) simulation shows that the required size of is 11.5 m 11.5 m. Additionally, the pMOS serves as a fast shut-off switch controlled by the BB signal. It not only provides a low output impedance , further reducing the output signal swing [9] , but also decreases the gain sensitivity to capacitance variations. Fig. 10 illustrates the effects of the two feedthrough cancellation schemes, as well as the fast shut-off switch on the "off"-state gain of the OOK modulator. Adding the reversed switching pair alone decreases the feedthrough by 4 dB, while selecting the optimum value further reduces it by 7 dB. Therefore, the proposed dual feedthrough cancellation schemes improve "off"-state isolation by 11 dB. Moreover, the use of pMOS fast shut-off switch contributes to an additional 5-dB improvement.
C. Transformer-Coupled VCO
To generate the required 0.5-dBm power at the DA output, simulations show that an LO amplitude of 100 mV (single ended) is needed at the OOK modulator input. This is generated by an nMOS cross-coupled VCO whose schematic is shown in Fig. 8 . The nMOS-only VCO exhibits less parasitic capacitance at the tank as compared to the complementary cross-coupled VCO. This results in a larger tuning range in addition to providing a higher output swing. The VCO tank consists of a varactor and the center-tapped primary coil of transformer . To maintain a simple symmetrical LO signal routing to the OOK modulator, the VCO output is taken via the transformer. Since the coupling factor , the VCO needs to generate a single-ended amplitude of 200 mV, which is readily achievable under a 1-V supply. Note that in contrast to [35] , the VCO is not used as an OOK modulator in this work since the start-up time of the VCO is much too large to support the requisite data rate.
IV. EXPERIMENTAL RESULTS
The proposed OOK TX is designed and fabricated in TSMC 1P9M 65-nm CMOS technology with an of 200 GHz. Fig. 11 shows the chip micrograph of the TX. It occupies an area of 0.36 mm including the pads, and its active area is only 0.077 mm . Standalone test blocks for the DA and modulator are also implemented and measured. All passives and RF interconnects are simulated in Agilent ADS Momentum, extracted as -parameter multi-port models, and included in the Spectre simulations.
All measurements are carried out on-wafer using Cascade Summit-11000 probe station. -parameter and linearity measurements up to 67 GHz for both the DA and OOK modulator are done using an Agilent PNAX network analyzer. To test the transient performance, a pattern pulse generator SDG12070 from Picosecond Pulse Labs is used to generate up to 16 Gb/s of BB signal. The modulated RF output signal is evaluated in both the time and frequency domains using an Agilent 86100D sampling oscilloscope and N9030A PXA spectrum analyzer. 
A. DA
The measured -parameters of the DA are compared with the corresponding simulation results in Fig. 12 . Measurement shows a power gain of 8.8 dB at 60 GHz, and a 3-dB bandwidth from 53.5 to beyond 67 GHz.
is below 10 dB from 57 to 67 GHz. In the TX, however, the input impedance of the DA is matched to the output of the preceding OOK modulator instead of the standard 50 . As can be seen, the measured and are in good agreement with the simulation results. Even though is degraded at lower frequency, mainly due to the inaccuracy in ground plane simulations, it still stays below 8 dB within the 3-dB bandwidth. In addition, Fig. 12 also demonstrates that of the DA is better than 40 dB over the entire frequency range. Although the measured is degraded compared to simulation, mainly because of the substrate coupling and proximity effect, which are not taken into account in simulation due to modeling difficulty, it is still at least 20 dB better than a two-stage CS amplifier without neutralization, as demonstrated in Fig. 12 . This proves the effectiveness of the proposed neutralization design technique. Fig. 13 illustrates the measured and simulated gain compression and PAE versus input power of the DA at 60 GHz. It shows a measured of 1.5 dBm, and a PAE of 11.2% at the compression point. The measured maximum PAE reaches 20.4% at 5 dBm of output power. Furthermore, Fig. 14 shows measured and simulated and saturated output power versus frequency. Measurement results show that the stays above 0 dBm from 53 to 67 GHz, meeting the TX power requirement defined in Section II. Due to the limited output power from the network analyzer, can only be measured from 59 to 67 GHz. In this frequency range, of higher than 5 dBm is achieved. Overall, the DA draws 10.6 mA from a 1-V supply.
B. OOK Modulator
Both on/off steady-state gains, as well as transient performances are measured for the proposed OOK modulator. Fig. 15 compares the "on" and "off" state gains from measurement and simulation. At 60 GHz, a gain of 1 dB and 29.5 dB are measured at "on" and "off" states, respectively, which is equivalent to an on-off ratio of 30.5 dB at steady states. Moreover, in the frequency range of 50-70 GHz, the on-off ratio stays above 30 Measured transient responses of the OOK modulator at 12 and 16 Gb/s are shown in Fig. 16(a) and (b) , respectively. The BB input signals, which swing from 0 to 1 V, are also illustrated alongside the RF output waveform. Data pattern of "0101" is used in this measurement. At 12 Gb/s, as shown in Fig. 16(a) , without the de-embedding probe, cable, and connector loss, the amplitudes at the "on" and "off" states are 70 and 3.5 mV, respectively. Accordingly, this gives a dynamic on-off ratio of 26 dB. Note that the dynamic on-off ratio differs from the steady-state performance shown in Fig. 15 due to a much higher switching speed. Moreover, Fig. 16(b) illustrates that the dynamic on-off ratio further decreases to 23 dB at 16 Gb/s of data rate. This degradation is due to the limited bandwidth of V-band cable with 1.85-mm connectors (up to 67 GHz), as well as limited rise time of the pulse generator (25 ps from 10% to 90%), as can be seen in the BB waveform in Fig. 16(b) . It can Fig. 20 . Measured spectrum at the RF output of the TX with VCO-generated carrier modulated by a data rate of 10, 12, 14, and 16 Gb/s. be further improved by adding a buffer or peaking amplifier at the BB input, at the cost of higher power consumption. Overall, the OOK modulator drains only 5 mA from its 1-V supply. Table II compares the performance of this OOK modulator with recently published high-speed OOK modulators.
C. VCO
The VCO consumes a power of 3.4 mW from a 1-V supply. The measured tuning range of the VCO is shown in Fig. 17 . It achieves a tuning range of 2 GHz with a phase noise better than 102 dBc/Hz at 10-MHz offset across the entire tuning range. When extrapolated to the OOK modulator input, the VCO provides a differential amplitude of 200 mV, which is sufficient to drive the OOK modulator. Due to the large VCO swing, no LO buffer is required before the OOK modulator.
D. Integrated OOK TX
The proposed OOK TX consumes a total current of 19 mA from a 1-V supply. Its performance is tested with pseudorandom bit sequence (PRBS) data feeding into the BB port. Fig. 18 demonstrates the measured time-domain RF output signal at BB data rates of 12 and 16 Gb/s without compensating cable, connector, and probe loss. Since the LO signal is generated by the on-chip VCO, it cannot be properly synchronized in the sampling oscilloscope. Therefore, the carrier is shown as a cloud of dots, with its envelope representing the OOK modulation. Due to the bandwidth limitations of the DA and the V-band coaxial cable, it can be observed that there is a 0.9-and 1.3-dB droop on the shortest pulse at 12 and 16 Gb/s, respectively. However, even at 16 Gb/s, the peak amplitude reaches 203 mV at the shortest pulse width. After compensating the measurement loss (4.6 dB at 60 GHz), it is equivalent to 0.75 dBm of output power. This surpasses the design requirement of 0.5 dBm obtained from system-level simulations and analysis. Moreover, at the shortest "0" bit, the RF signal has an amplitude of 16 mV when modulated with a 16-Gb/s data rate. Therefore, a dynamic on-off ratio of 22 dB can be achieved, resulting in 1-dB degradation from the OOK modulator standalone measurement result. According to Fig. 3 , this on-off ratio necessitates 1.5 dB of TX power penalty. Since the measured TX power is already 1.25 dB higher than the specification, a 0.25-dB improvement from the 10-dB RX NF can still guarantee the required BER of 10 . To further assess the quality of the modulated signal, the envelopes of the RF outputs are extracted by finding the local maximum in MATLAB. Limited by the resolution of the sampling oscilloscope, a total of 160 bits are extracted. Fig. 19 illustrates normalized eye diagrams of the extracted envelopes at 12 and 16 Gb/s. A clear eye opening is shown in both cases. At 12 Gb/s, the eye opening is 80% of the eye height, whereas at 16 Gb/s, it reduces to 70%. This is consistent with the droop observed at the shortest pulse in Fig. 18 , and can be readily recovered by the use of a BB limiting amplifier at the RX.
The TX output signal is also evaluated in the frequency domain using a spectrum analyzer. Fig. 20 shows the measured spectrum of the modulated signal with BB data rates at 10, 12, 14, and 16 Gb/s. Note that the V-band harmonic mixer used for spectrum analyzer frequency extension creates harmonic distortions that contaminate the lower half of the spectrum, hence only the upper half spectrum is shown for clarity. It can be seen that the carrier frequency is at 60.3 GHz, consistent with the VCO measurement result. The main lobe bandwidth, which is indicated by the location of the first notch, changes from 10 to 16 GHz, in accordance with the BB data rates.
E. Performance Comparison
The proposed 60-GHz OOK TX is compared with other highspeed mm-wave OOK TXs in recent literature in Table III . As shown in the table, the presented work achieves a high data rate while consuming a low amount of power. The design is also low voltage due to the elimination of transistor stacking using the proposed neutralization and bulk-driven design techniques. For a comparable output , this design demonstrates about 2 and 8 better bit-energy efficiency than [13] and [10] , respectively. To better compare these OOK TXs with different output power levels, we define a new figure-of-merit (FoM) as mW bit pJ Data Rate Gb s mW mW (16) where is the power consumption, and is the 1-dB compression point at the RF output of the TX. The calculated FoMs are also listed in Table III . It can be seen that the FoM of this design is comparable to [15] , and higher than the other published works. However, the TX in [15] aims at a longer communication range, and has a lower data rate than the proposed design. Moreover, as another important design concern for the WiNoC application, the active area of this TX is also comparable to the smallest achieved in [13] .
V. CONCLUSION
A high-data-rate low-power OOK TX in the 60-GHz band has been presented. The transformer-based neutralization technique in the DA, the bulk-driven OOK modulator, and the transformer-coupled VCO contribute to a compact and highly efficient TX. In addition, an analysis of the optimum neutraliza-tion condition formulates guidelines for the transformer design in the DA. Based on design targets obtained from system-level simulation and analysis, the TX implemented in 65-nm CMOS achieves a maximum data rate of 16 Gb/s, and a bit-energy efficiency of 1.2 pJ/bit. This provides a feasible low-power area-efficient TX solution for the WiNoC in future massive multi-core processors.
