A compact and low-power wireless receiver supporting 2.4 GHz industrial, scientific and medical (ISM) band is implemented in a 130 nm Complementary Metal-Oxide-Semiconductor (CMOS) process. The GHz operating frequency renders the chip to be matched with a mm-sized antenna to reduce implants' size and improve patients' experience. To simplify the implanted chip, the downlink is through On-Off Keying (OOK) so non-coherent detection and simplified receiver chain can be deployed. Boosted Rectifier and open-loop amplifier-based receiver chain lowers the chip's power consumption to nW level. The measured sensitivity reaches −50 dBm at a Bit Error Rate (BER) of 1e-3.
Introduction
The next generation of implantable wireless medical devices aims to provide precise and reliable treatments with minimally-invasive methods. There are few observations for implantable medical devices. The first is that the human body is an excellent temperature regulator, hence temperature drift, which may be substantial in other applications such as consumer electronics and industrial products, is small in implanted medical devices. Hence quite some circuit simplifications can be done due to small temperature drift, especially for the ring oscillator in the synthesizer. In addition, the corresponding base-station can consume much larger power. Hence the system can be designed to let the base-stations handle complex modulation and demodulation task and simplify the implanted device as much as possible. For downlink, due to the larger output power and more sensitive receiver at the base-station, OOK is a good choice. Since the receiver can use passive gain and close to zero power full-bridge rectifier in the receiver chain. For uplink, due to the low transmitter power in the implanted device, the transmitted signal is small and hence it is required that the base-station receiver is sensitive and can perform robust decoding with low SNR. So FSK is a good choice for the uplink. [4] used ultrasonic power transfer function to achieve safe and higher power level even though the chip is operating in deep inside body.
But traditional discrete solution is power hungry and bulky due to the size of antennas and batteries [1] . If we can reduce wireless power consumption from mW level [28] to µW [24] level or even nW level, we may be able to find the ultimate solutiona mm-sized single-chip solution with no other external components such as crystals, batteries and antennas. To replace batteries, wireless power transfer technique is demonstrated in [29] and [30] . With a choice of 2.4 GHz ISM Band, it is shown to be able to integrate a loop-antenna on-chip [20] . As an intermediate solution, this paper presents the first complete mm-sized nW-level ISM band receiver in a low-cost 130 nm CMOS process. The next step will integrate the antenna and replace the crystal with an on-chip oscillator.
Backscatters, such as those demonstrated in [2] and [3] , have been implemented at board levels with various wireless protocols such as Lora, Bluetooth and even Wi-Fi.
The key benefit of a passive backscatter system is that the transmitter is a scatter to reflect incoming wireless waves and effectively consumes zero power. The focus of this paper will be on the receiver chain since the transmitter is covered in detail in [2] . The details and measurements of the synthesizer will be covered in a separate paper.
System requirements

Operating frequency and sensitivity requirement
As demonstrated in [7] and [19] , the optimal frequency for wireless power transfer is in the GHz-range for mm-sized antennas. ISM band is a good choice for such a small featured implanted device. In typical applications, external base-stations with 20 dBm output power are readily available. At this frequency, 1 to 3 m of separation between RX and TX incurs 40 to 50 dB path loss in free space, the received power at the antenna is −20 to −30 dBm assume no antenna gain. According to the study of 2.4 GHz path loss in human tissues [22] , the additional path loss is close to 20 dB for 2 cm of human muscle tissue ("r ¼ 50:8 and ¼ 2:01 S/m). Hence the receiver is required to have a sensitivity of −40 to −50 dBm, with minimum power consumption.
System diagram
The whole transceiver system diagram is shown in Fig. 1 . To simplify the implanted transceiver, the downlink is through OOK so non-coherent detection and simplified receiver chain can be deployed. A passive transmitter is time-multiplexed with the receiver to send back data to base-station by backscattering FSK modulated signals. A boosted differential rectifier is first used to amplify input radio frequency (RF) signal. An open-loop amplifier acts as limiter before sending data to a digital slicer. A floating body TX-RX switch is used to isolate the transmitter and receiver. A Frac-N PLL with a novel stacked ring oscillator is used to generate system clocks (such as 250 kHz or other frequency depends on modulation and data rate) for the FSK modulator in TX from a 32.768 kHz low-power and low-cost crystal.
T/R switch
At 2.4 GHz, the loss though switches' substrate is significant, especially when large transistor is used reduce the "on" resistance (Ron) of the switch. Hence, to minimize insertion loss via the substrate parasitic capacitance, Deep N-Well NMOS gates are used with their own bulk biased to ground via a large 20 K resistor as shown in Fig. 2 . This makes the bulk node effectively bouncing together with the input RF signals and renders the substrate parasitic capacitance (Cdb, Csb and junction capacitance) less effective and reduces the loss though substrate.
OOK receiver
The receiver consists of three major blocks: 1. the cascaded differential rectifiers, 2. the open-loop RF amplifier, and 3. the reduced kickback data slicer/demodulator.
Cascaded differential rectifiers
Rectifier is a device converts RF power to DC signal. Traditionally at board level design, Schottky diode-based Dickson rectifiers are widely used due to Schottky diodes' low forward voltage drop and fast switching speed. However, Schottky diodes are not supported in all CMOS technologies. Dickson rectifiers with zero-Vt native MOS were proposed to minimize the impact on efficiency and minimum working input amplitude due to MOS's turn-on threshold voltage (Vt). However, this comes at the expense of large leakage current and higher cost due to the additional mask layer and process the zero-Vt native devices required. In fact, not all CMOS process supports zero-Vt native device.
The differential and complementary version solves this issue by using PMOS and NMOS together as shown in Fig. 3(a) . The operation of the four-transistor cell is easily understood by inspecting the waveforms. During half of the switching cycles, Vp is high. In this case, M1 and M4 are on and M2 and M3 are off. Current flows into V H through M4 and out of V L though M1. During the other half cycle, M1 and M4 turn off and M2 and M3 are on, but the current flow at V H and V L has the same direction as the previous half-cycle. Thus, a DC voltage is developed across a load connected between V H and V L , in general,
where V RF is the ac voltage amplitude of V p and V drop represents losses due to switch resistance and reverse conduction. Above analysis shows the output voltage is small, when the input RF swing is small. That is usually the case considering that the system needs to work with RF input as low as −20 dBm or even less. To increase the output swing, multiple such differential stages can be cascaded as shown in Fig. 3 (b) such that the voltage output is roughly 2 Ã N Ã V RF , where N is the number of stages in cascade. In some instances, 24 or even more stages are used in cascade for low input power cases. A detailed comparison of single-ended and differential rectifiers is studied in [5] and [25] . One issue in the circuits shown in Fig. 3(b) is that the output signal takes long time to settle when input RF power is low and input signal swing is small. Hence an AC boosting scheme with the help of extra capacitors is implemented to improve the output charge-up time as shown in Fig. 4 .
Due to additional AC coupling boost capacitor (C B ), the intermediate nodes and the final output are charged up faster compared with the case that no boost capacitor is applied. The simulated charge-up curve is shown in Fig. 5 for an input RF signal swing of about 10 mV. The green curve is the output of a design that has AC coupling boost capacitors, while the red curve is the output of a counterpart that has no AC coupling boost capacitors. Thus, 2X speed up in the charge-up behavior is observed.
Open-loop amplifier (limiter)
A traditional implementation of a gain amplifier is shown in Fig. 6(a) . An Op-amp with resistor feedback is used and the voltage gain is determined by resistor ratios. (A v ¼ R2=R1). The benefits of this topology are that the gain is tunable, and thus is suitable for variable gain amplifier (VGA). However, in OOK receiver, a limiter suffices and no automatic gain control (AGC) is needed. Hence the open-loop self-biased amplifier shown in Fig. 6(b) . can be deployed. The area and power can be substantially reduced compared with the closed-loop Op-amp based approach. The AC-coupled self-biased approach also eliminates DC-offset in the chain and we can avoid the use of bulky DC-offset cancellation loop.
The detail of a fully differential version of the openloop amplifier used in Fig. 6(b) is shown in Fig. 7 . The NMOS input pair is biased from a mirror-generated bias voltage V b . The PMOS's gate voltage is self-biased by a feedback resistor from the output. Both PMOS's input and NMOS's input are ac-coupled to allow different gate bias voltage for PMOS and NMOS. In such a way, we can control the current of the amplifier by adjusting the NMOS bias voltage and PMOS will be self-biased. In such a way, we can substantially reduce the current of the amplifier, compared to a traditional self-biased inverter approach. The voltage gain is not constant across corners since the g m and g DS of the NMOS and PMOS will change in different corners. However, as we mentioned previously, as long as the gain is large enough to saturate the output to make the following slicer make a decision, we are not worried about the gain variations across corners.
Low-kickback self-comparing slicer
To improve sensitivity and bit error rate (BER), as shown in Fig. 6(b) , several resistors (R3/R4) and capacitors (C1) are used to perform self-comparing demodulation. The resistor (R4) and capacitor (C1) forms a low-pass filter and establishes the DC voltage for the slicer. The resistor R3 (only loaded by parasitic of the slicer) is used to pass AC signal to slicer while reducing the kick-back noise of slicer to the previous stages. To further suppress kickback noise, a two-stage dynamic comparator with pre-amp stage similar to [6] is used to further reduce the kickback. 
Low power Frac-N all digital synthesizer
The FSK modulator needs a different clock for different data rate and modulation. Such a frequency-variable system clock is generated from a compact All Digital Fractional-N synthesizer that used a low-power 32.768 kHz crystal (similar to the one implemented in [8] ) as the reference. The 32.768 kHz crystal has very low cost since that it is widely used in watches and all kinds of real-time clocks (RTC). The ADPLL's architecture is shown in Fig. 8 . Considering such a low reference clock and PLL's bandwidth has to be less than 1/10 of the input reference frequency, analog PLL loop will need huge loop capacitors that have to be placed off chip and is not preferred. Digital PLL can achieve the required bandwidth using small-area digital loop filters. In traditional all digital PLLs, a complex and power-hungry time-to-digital converter (TDC) is needed for low jitter. However, in this system clock can tolerate relative large jitter and hence an TDCless or an embedded TDC similar to [12] using coarse timebase from multiple-phase VCO output and fine time-base from a phase interpolation block is used to save TDC calibration in gated-ring-oscillator (GRO) based TDC [13] or SAR based TDC [14] or frequency calibration (FCAL) and loop dynamic calibration needed in Bang-Bang ADPLLs [15] .
The digital loop is similar to the proportional-integral loop filter in [16] and drives the VCO via two ports-the fine bits to turn on/off the switchable capacitors at VCO nodes and the other coarse bits to a voltage sigma-delta DAC that effectively regulates the VCO supply and hence adjusting the VCO frequency [17] . The circuit is also much simpler than the ADPLL is [23] , [26] and [27] . The phase interpolation block can also be gated to work only when reference comparison edge is coming, thus saving a large amount of power.
The details of the PLL design and measurements will be covered in a separate paper. For the power breakdown in Table I , the gated phase interpolator, digital loop filter and sigma-delta modulator is counted in the digital logic. A voltage regulator regulates the supply of the VCO and rejects supply noise [18] and is counted in the power of VCO.
Implementation and discussion
The chip is implemented in a 1P7M (with one thick metal) 130 nm Ultra-Low-Power CMOS process from SMIC. This chip is taped-out in a 3 mm Ã 4 mm multi-project wafer (MPW) that shares die and package with other projects. The simulated power breakdown in typical corner for the whole transceiver is shown in Table I. The chip layout is shown in Fig. 9(a) . The chip size is approximately 500 um Ã 300 um, excluding pads and Electrostatic Discharge (ESD) protection circuits. For production chips, the floorplan and layout can be further optimized to save area. The chip is packaged together with other projects in a custom 216 pin BGA package from Micro Bonding Corp. A 7-layer PCB test board was built to route the dense 216 pins (pins shared with other MPW projects) out to test the chip. The PCB board with packaged chip is shown in Fig. 9(b) . Standard Serial Peripheral Interface (SPI) protocol is used to program the on-chip registers to provide programmability for settings such as PLL divider ratio, PLL loop bandwidth, number of rectifier stages, etc.
To reduce uncertainty such as multipath and fading, testing approaches similar to [2] are used where programmable attenuators are placed between RF signal source and the receiver's SMA input. The amount of attenuation is swept and the bit error rate is counted in a daughter board. The measured BER vs. input power is shown in Fig. 10(a) . The measured OOK time domain output is shown in Fig. 10(b) . A comparison with other state of the art work is shown in Table II . Recently, there are several publications on low power wireless transceivers such as Bluetooth Low Energy (BLE) [21] . Some of the papers are designed for so-called "Wake-up Receivers", that usually has no transmitter and the purpose is to detect wake-up signal and wake up the main receiver, such as [9] and [11] . This paper is the first low power chip that includes receiver and transmitter but has similar power consumption as the receiver-only counterparts Board level implementations of backscatters [2] , [3] successfully demonstrated the concept of zeropower transmitter, however, that board is quite large (>3 cm Ã 3 cm) and power hungry as a medical implanted device. The ultra-low sensitivity receivers in [10] is obtained at the cost of large power consumption (4500 nW) and is also not suitable for implanted medical devices.
In summary, this paper presents the first reported lowpower chip-level (not board level) medical receiver that has the power as the state of art Wake-up Receiver, but yet can receive and transmit in different slots in a time-domain multiplexing (TDD) mode.
Conclusions
The chip level implementation of a nW level receiver in a 130 nm 1P7M CMOS process for implanted medical devices is presented. Such new product is well suited to be used in a low-power and compact implanted medical device. 
