Abstract-This paper presents a 50GHz wireless receiver for 10Gbps on-off keying (OOK) modulated signals designed in 28nm CMOS. An in-depth analysis indicates that the energy detector has a substantial impact on the receiver performance and should be properly taken into account in the link budget. The work covers the design of a novel 50GHz broadband low noise amplifier and its co-design with envelope detector and limiting amplifier. The extracted simulation results show that the receiver is able to detect a 10Gbps signal at 10cm distance with a BER of 10 -12 while consuming only 70mW from 1V power supply.
I. INTRODUCTION
The complexity of electronic systems is increasing continuously. The amount of data elaborated grows on daily basis, while the trend of distributed and highperformance computing is leading to the development of new systems with demanding requirements. Multicore/multi-node computers necessitate a huge number of short-range I/O interfaces with tens of Gbps bandwidth capability. However, wired connections such as backplanes have limited bandwidth, heavily limiting the mechanical design flexibility and raising the costs of materials and assembly. In this context, the concept of a wireless chip-to-chip connection becomes very attractive, promising high flexibility and versatility [1] .
The mm-wave frequency range, nominally located between 30GHz and 300GHz is of great interest in this framework [2] - [3] . The spectrum is very broad and still unpopulated, allowing allocating bandwidths several GHz wide, leading to multi-Gbps communication speed even with very simple modulation techniques [4] - [6] . Moreover, with a wavelength of less than 10mm, the size of the antenna turns out to be very compact, fitting into the IC package or the chassis of small devices, allowing miniaturization with respect to current wired solutions [7] .
Though mostly oriented to digital applications, deep sub-micron silicon CMOS technology lends itself to implementing analog functions at high frequencies: for  Manuscript received November 11, 2013; revised January 21, 2014. each step of minimum gate length reduction, a corresponding increase in f T is achieved, being in excess of 450GHz for 28nm [8] . The System on Chip (SoC) approach for an entire analog transceiver in the mm-wave range becomes thus very attractive.
In this paper we address the realization of the signal detection into a wireless On-Off Keying (OOK) receiver for 10Gbps communications with 50GHz carrier frequency. Section II describes the architecture of the receiver and the link budget analysis, while section III details the main building blocks. Section IV describes simulation results, and conclusions follow.
II. ARCHITECTURE AND LINK BUDGET
The complete architecture of the receiver is shown in Fig. 1 . The 50GHz OOK-modulated signal is collected by the off-chip patch antenna and delivered to the single-ended Low-Noise Amplifier (LNA). Bondwires in a ground-signal-ground (GSG) configuration are employed to connect the antenna to the LNA input, allowing covering distances of several mm with negligible signal loss. The output of the LNA is fed to the Envelope Detector (ED), which performs power detection. Since the structure of the ED requires a differential input, an on-chip balun has been interposed to properly convert the single-ended output signal of the LNA. A dummy ED is connected to the second input of the Limiting Amplifier (LA) to improve the power-supply rejection ratio (PSRR). The LA amplifies the detected signal and drives the off-chip measurement instrumentation through the output buffer. The proposed OOK system is based on energy detection, which makes the front-end non-linear. As a consequence, Friis formula does not apply and the computation of the link budget is not straightforward [9] . The equivalent model of the receiver in Fig. 2 , consisting of the cascade of LNA and energy detector, is used to calculate the overall receiver equivalent noise figure F RX . In this model, s in is the desired input signal, n s the channel noise, n amp the input-referred LNA noise and n int the aggregated noise of the envelope detector and following stages, while G LNA and a 2 are the LNA gain and squarer gain, respectively.
The input signal-to-noise ratio (SNR) is given by:
where E[] denotes the expected value, E b the energy of the bit, B r the bit rate, N 0 the power spectral density of the channel noise and B the signal bandwidth. Accordingly, the output SNR can be calculated as:
Expanding (2) and solving for SNR at the output, the equivalent receiver noise figure F RX is: 
where F LNA is the noise figure of the LNA. Two important insights can be pointed out. First, even if the receiver is completely noiseless, the SNR degrades by 6dB. Second, unlike the common linear case, the equivalent receiver noise figure F RX depends not only on the gain and noise figure of its blocks, but also on the input SNR. This is due to the squaring action of the energy detector that translates to the output an amount of noise proportional to the power of the input signal (see Eq. 3).
In short-range chip-to-chip communications, the typical distance to be covered is 10cm. Assuming a transmitter output power of 10dBm, reasonable at 50GHz, the signal power at the input of the receiver is -34dBm. Taking into account 4dB link margin, this translates to a minimum SNR of 33dB at the input of the receiver. Since at least 17dB of SNR is required to demodulate an OOK signal with BER<10 -12 , the overall receiver noise figure cannot exceed 16dB. The requirement further tightens to 10dB after taking into account the squaring action of the energy detector. Assuming a maximum LNA noise figure of 10dB, the LNA gain needs to be at least 25dB over the receiver bandwidth to properly suppress the aggregated noise of the squarer and following stages, estimated to be σ , a2=1 and 4dB link margin.
Fig . 3 shows the communication distance against the LNA gain. As it can be seen, due to the energy detector action, the LNA is required to realize a gain greater than 25dB, while keeping a noise figure around 10dB. Targeting these specs over a 20GHz bandwidth around 50GHz is challenging. At higher LNA gain, only the LNA noise figure and energy detector SNR degradation limit the link performance. In this context, recognizing the detrimental effect of the energy detector on the degradation of the receiver SNR is paramount for a correct link budget and transceiver operation.
III. BUILDING BLOCKS

A. LNA
Six common source stages two by two stacked in a current re-used architecture construct the LNA core. Cascaded stages result in large gain while the current reuse leads to low power consumption. To enable 10Gbps communication, in addition to the large gain, a wide operating bandwidth of 20GHz is also required. Third order inter-stage networks are employed between the amplifying stages to achieve a larger bandwidth. The frequency responses of the inter-stage matching networks are stagger tuned to further extend the overall LNA bandwidth.
To convert the signal from the S. E. output of the LNA to the differential input of the ED, a balun is required. To correctly operate, the impedance of the two coils should be much greater than the load impedance (i.e. the one of the gates of the ED in this case), and the coupling factor k should as close as possible to 1. However, the electrical characteristics of the two topmost thick-copper layers available in the back-end of the employed CMOS 28nm technology allow a k within 0.8-0.9, while the maximum value achievable for the inductances to keep their selfresonance frequency well above the 50GHz of the carrier, is lower than 200pH. Based on these stringent design constraints, the two octagonal single-turn 150pH coils where designed to show a k of 0.82 and a self-resonance frequency of 85GHz. The resulting 6dB loss of the balun needs to be taken into account by the minimum gain requirement of the LNA. The balun is very compact, occupying 95x95μm 2 only.
B. Envelope Detector
Envelope detector circuits are mainly based on exploiting the 2nd order non-linearity of the MOSFET operating in saturation to produce an output signal proportional to the square of the input. Due to the stringent receiver noise requirements, ED gain is a critical parameter to maintain high SNR signal and relax the gain of the LA, drastically reducing the overall power consumption [10] . Fig. 4 (a) shows the source-follower based ED, where the push-push connection of the pair also nulls the 1st order component of the output signal. The main drawback of this circuit is the limited gain [11] . To increase the gain, an improved version of the one proposed in [12] is presented in Fig. 4 (b) . The proposed ED combines rectification with amplification by means of a class-AB biasing of the NMOS input pair. A tunable PMOS in triode is employed as a load to accommodate different input amplitudes. A cascode transistor has been inserted between the push-push pair and the load in order to improve the output resistance and thus maximize the achievable gain. The output amplitude of the sourcefollower based ED A out,sf and the proposed one A out,AB are: 
C. Limiting Amplifier
The LA core consists of the cascade of five differential stages closed in a DC offset cancelation loop. The output buffer drives the measurement setup. The LA core stages are realized with differential pairs with cross-coupled capacitances for bandwidth extension. No inductive peaking has been employed to minimize the area occupancy.
The cascade of n identical gain cells, each one having a bandwidth BW c , exhibits an overall bandwidth of
where m is equal to 2 for first-order stages and 4 for second order stages [13] . In our case, the network is first order and thus m is 2. For a certain gain A tot required for the multistage amplifier over the bandwidth BW tot , the minimum gain-bandwidth product GBW c of the single stage is required to be [14] :
The main lobe of the baseband 10Gbps OOK spectrum occupies a bandwidth of 10GHz. Since the best compromise between SNR and Inter-Symbol Interference (ISI) contribution is achieved for a receiver bandwidth around 0.7 times the one of the signal, the targeted BW tot is 7GHz. Given the required SNR at the output and the expected integrated noise, a minimum amplitude of 400mV is needed, requiring a minimum gain A tot of 36dB for the LA. From (7), the best design compromise is achieved with 5 stages each one delivering 7.2dB gain and 18.2GHz bandwidth, still challenging to achieve in the 28nm technology node, especially at large signal. Note that in this calculation, the buffer has been neglected since the 60fF parasitic capacitance of the output pad together with the 50Ω resistance of the BERT leads to a bandwidth of more than 50GHz for the last stage itself. Since the fourth stage is already working in large signal regime even under the minimum input signal expected, the fifth core stage has then been changed into a f T -doubler architecture to taper through the buffer avoiding losses in bandwidth [14] . The schematics of a single LA stage and buffer are shown in Fig. 5 . The targeted signal amplitude of 400mV S.E. requires 8mA to be delivered by the last stage into the 50Ω impedance of the Bit Error-Rate Tester (BERT). To minimize the input capacitance required to drive such current, a f T -doubler stage has been employed. An open drain configuration has been selected to avoid current partition with the load resistances of the stage, thus minimizing the size of the buffer for the given output swing. Two off-chip bias tees bias the stage.
The offset cancelation loop employs a similar differential pair as the core stages. The offset is sensed at the input of the buffer rather than the output, since this one is open drain and thus its DC gain is equal to zero. The feedback is closed at the output of the first LA stage in order not to place the 350Ω load resistance of the pair directly in parallel with the load of the ED, which would seriously degrade its gain. The pole of the loop-filter has been set to 450kHz, low enough to avoid significant eye closure due to the drop of longest expected run.
IV. SIMULATION RESULTS
To verify the performances of the proposed architecture, post-layout simulations were performed using Cadence SpectreRF. The overall receiver power consumption is 70mW from 1V Vdd: 30mW for the LNA and 40mW for the envelope detector and limiting amplifier. The chip area is 1450x800μm 2 including pads. The simulated conversion gain (S 21 ), input reflection coefficient (S 11 ) and noise figure (NF) of the LNA are shown in Fig. 6 . The LNA achieves a gain of 26dB over a bandwidth of more than 28GHz. The S11 is better than10dB over the 45-65GHz bandwidth. The NF is less than 6.8dB across the whole operating band. To assess the link performance, a 10Gbps PRBS32 bitstream modulated by a 50GHz carrier was fed into the balun with a transient noise simulation. The amplitude of the input signal was set to -38dBm, i.e. the expected value at the input of the LNA when the receiver is placed at a distance of ~10cm from the transmitter with 4dB link margin. The single-ended eye diagram at the output of the LA is depicted in Fig. 7 . An SNR better than 17dB was simulated, consistent with a simulated BER<10 -12 . In Table I , the performance of the receiver is compared to the state of the art, assuming 100mW Pdiss of the TX. The proposed work shows the highest combination of datarate and communication distance employing nondirective antenna while still keeping low power consumption.
V. CONCLUSION
A short-range mm-wave 50GHz wireless OOK receiver for 10Gbps communication has been presented. The receiver was realized in 28nm CMOS technology and it consumes 70mW from 1V power supply when effectively demodulating a 10Gbps signal at 10cm distance with a BER of 10 -12 . 
