Abstract-Energy storage elements and electromechanical timing references, such as crystals, can dominate energy-autonomous wireless sensor node volume at the 1-mm 3 scale. This brief proposes a clock-data recovery circuit that receives power from integrated photovoltaics and extracts clock and data signals from optical data input. A prototype fabricated in 90-nm CMOS was tested over a V DD range of 150-500 mV, and it supports maximum data rates from 15 kb/s to 4 Mb/s while dissipating 51 nW-3.5 μW. A software implementation of the conjugate gradient method running on an ultralow-power embedded microcontroller was investigated for its potential to compensate for jitter when the recovered clock is used to sample a sensor input. The output signalto-noise-and-distortion ratio of an analog-to-digital converter can be improved by up to 16-28 dB for an estimated microcontroller power consumption of 15 μW.
Ultralow-Power Optical CDR for Integrated
Photovoltaic Energy-Harvesting Sensors Khadar Shaik, Student Member, IEEE, Travis Kleeburg, Member, IEEE, and Rajeevan Amirtharajah, Member, IEEE Abstract-Energy storage elements and electromechanical timing references, such as crystals, can dominate energy-autonomous wireless sensor node volume at the 1-mm 3 scale. This brief proposes a clock-data recovery circuit that receives power from integrated photovoltaics and extracts clock and data signals from optical data input. A prototype fabricated in 90-nm CMOS was tested over a V DD range of 150-500 mV, and it supports maximum data rates from 15 kb/s to 4 Mb/s while dissipating 51 nW-3.5 μW. A software implementation of the conjugate gradient method running on an ultralow-power embedded microcontroller was investigated for its potential to compensate for jitter when the recovered clock is used to sample a sensor input. The output signalto-noise-and-distortion ratio of an analog-to-digital converter can be improved by up to 16-28 dB for an estimated microcontroller power consumption of 15 μW.
Index Terms-clock-data recovery (CDR), jitter, nonuniform sampling, optical wireless sensor node.
I. INTRODUCTION

S
AMPLING of sensor data is the fundamental task for which an energy-autonomous wireless sensor mote is designed. There are numerous applications, such as cubic-millimeterscale intraocular pressure monitoring [1] , that can only be enabled by dramatic reductions in system volume. Energy storage elements (e.g., ultracapacitors and batteries) and electromechanical timing references (e.g., crystals) can dominate volume at this scale. For example, one of the smallest 32.768-kHz crystals occupies a volume of 0.9 mm 3 [2] . One approach to reducing volume is to combine optical power delivery or energy harvesting from ambient light [3] with optical clock-and-data recovery (CDR) to support sensor operation [4] . A sensor node concept that would integrate these functions on a single die is shown in Fig. 1 circuits. The second provides data that would contain microcontroller instructions and configuration information. For example, commands that set the sampling profile of an integrated ΔΣ modulator (DSM)-based analog-to-digital converter (ADC) can be transmitted to the system, in order to maintain minimum energy operation [4] . These signals would travel through free space before they reach an optical filtering matrix. This filtering matrix would cover the active circuits with an opaque material to ensure that no carriers are generated in that region of the die. The integrated photodiode array would contain multiple elements: some produce power and others receive data. In the concept system, shallow trench isolation (STI) or alternative photodiode structures would be used to eliminate lateral photocurrent that can disrupt operation of the active circuits [5] . This brief describes one critical component necessary to realize the concept system shown in Fig. 1 : a subthreshold optical CDR circuit than can operate from integrated energy-harvesting photovoltaics. Other parts of the system are implemented or emulated using discrete components. Section II reviews the CDR design and operation. In Section III, measured results from a test chip show that the recovered clock's jitter increases with decreasing illumination and supply voltage. Increased jitter limits both the achievable received data rate and the signal-to-noise ratio (SNR) performance of an ADC, when the received clock is used as a sampling clock. Therefore, Section IV explores the practical implementation of an iterative nonuniform sampling recovery scheme in software that can mitigate the negative impact of worsening jitter. Conclusions and future work are discussed in Section V. The CDR consists of a charge-pump phase-locked loop (PLL), two additional interleaved replica voltage-controlled oscillators (VCOs), and digital logic. Incoming data are applied at the input of the PLL's phase-frequency detector. The data rate is carefully chosen near the minimum frequency attainable by the PLL VCO. Data are 3b/4b encoded to maximize transitions. After a sufficient number of transitions, the PLL will have adjusted the frequency produced by its VCO such that it approximately equals the data rate. The frequency output by the interleaved VCOs should also closely match the data rate since all VCOs are identical and have the same control voltage. The incoming data signal controls the interleaving of the VCO outputs to produce the recovered clock. Interleaving removes the accumulation of errors caused by oscillator mismatch and resulting frequency deviations from the data rate.
II. OPTICAL CDR DESIGN
The VCOs are made from 160 delay elements and a NAND gate that provides an enable input. The delay cell consists of a minimum-sized inverter loaded with a tunable RC load, where the resistance is the subthreshold resistance of an nMOS tuning transistor, and the capacitance is created by an nMOS capacitor. The outputs of the ring oscillators pass through level-sensitive latches, which prevent high-frequency glitches at the output of the oscillator. The latches also provide voltage level conversion using cross-coupled pMOS pullups to allow the oscillators to run at lower voltages than the other PLL components. The charge pump biases its nMOS current sink to V DD and pMOS current source to ground, eliminating the need for an analog bias circuit. The loop filter H(s) consists of a 40-pF capacitor in series with a 300-kΩ resistor.
Eliminating analog bias circuitry reduces static power consumption; however, the performance of the CDR will be more sensitive to variations in power supply voltage caused by changes in the received optical power. There will also be significant performance variability due to process variations. These issues will be explored in the next section.
III. CDR
To demonstrate operation using optical power and data, the optical data signal was captured with an off-chip Panasonic loaded with a 5.6-kΩ resistor. Although this component was not integrated on the same die, it is possible to implement similar structures in silicon. A data rate of 50 kb/s was achieved at V DD = 300 mV [4] . The power needed for the limiting inverter and CDR were provided by energy-harvesting photodiodes driven by the power signal P IN at an illuminance of 4.6 klx. The photodiodes used to power the circuit had a total active area of 0.135 mm 2 [3] . No optical filtering was included in this experiment; therefore, optical crosstalk between the data and power signals limited the CDR performance due to fluctuations in the energy-harvesting supply voltage.
To exercise the maximum data rate achievable by the CDR alone, both power and data inputs were provided electrically to eliminate bandwidth limitations of the optical receiver. Fig. 3(a) shows an eye diagram of the recovered clock (CLK), confirming CDR operation at V DD = 500 mV. To demonstrate the quality of the recovered clock, a jitter histogram is shown in Fig. 3(b) . The input data stream (DIN) used to test the performance was a 2 8 − 1 pseudorandom bit sequence that is 3b/4b encoded by the polynomial X 8 + X 7 + X 5 + X 3 + 1. Fig. 3(c) shows the eye diagram for the recovered data measured at 500 mV with a 3b/4b 1800-kb/s data stream applied at the input. The 3b/4b data sequence contains four distinct data patterns; each data pattern produces a quantized bit period in the recovered data. A measurement of this quantization is shown in the timing interval error (TIE) plot in Fig. 3(d) . The input and recovered waveforms for a 500-kb/s data stream with the CDR operating from a 500-mV supply are shown in Fig. 4 , which also captures the acquisition time of the CDR at this operating point. The acquisition time is measured by gating the PLL VCO with the enable signal E. When E is low, the recovered clock CLK is free running, and when E transitions, the loop starts tracking the input data transitions.
The nominal transistor threshold voltage for the CDR is 250 mV. The performance of the circuit was characterized across V DD from 150 mV to 500 mV, using 3b/4b-encoded data and an input data stream of alternating 1s and 0s that emulates optical clock delivery. For 180-200-mV V DD , the PLL reference was recovered from the input signal D IN supplied electrically, whereas for all values, an external PLL reference signal (EXT) was supplied electrically to measure an upper bound on performance. Measured results across three test chips (IC1-IC3) are displayed in Fig. 5(a)-(d) , showing approximately 2-3× die-to-die variation in performance. Table I reports measured results from IC1. TABLE I  MEASURED CDR DATA RATES AND LOCKING RANGES (IC1) IV. JITTERED SAMPLING RECOVERY In a crystal-less sensor such as the proposed optical sensor node, sampling of analog signals can be performed using the clock recovered from the received optical data signal D IN . As shown in the previous section, this clock could have significant jitter depending on the received optical power P IN because it determines the supply voltage V DD . In most situations, the sensor will receive a varying amount of optical power because of variations in ambient lighting conditions or in distance from or orientation with respect to an explicit optical power transmitter. Sampling jitter can show up as fractional spurs, e.g., SNR and signal-to-noise-and-distortion ratio (SNDR) degradation in the output spectrum of an ADC [6] . Whether the jitter is random or deterministic [7] , jittered sampling can be viewed as a special case of nonuniform sampling [8] . In this brief, the conjugate gradient (CG) method [9] , which is a technique developed to solve for information gaps resulting from irregular sampling, is adapted to recover the original signal from jitter-induced sampling errors. Jittered samples can be formulated as a system of linear equations in the unknown jitter delays, which can be solved in a least-squares sense using the CG approach. To our knowledge, this is the first application of the CG method in a system with limited energy and computational resources.
To study the impact of utilizing the CG method to perform jitter correction, jittered sampling was implemented by clocking the DSM reported in [4] with the output of the CDR. An alternating 1/0 pattern or 3b/4b-encoded data sequence was applied to the CDR input; the CDR output triggered a Moving Pixel PG3A Tektronix-compatible pattern generator to produce all DSM clocks. The Nyquist rate was chosen at f N = 32 kHz. The DSM was clocked at f S = 1.6 MHz for an oversampling ratio (OSR) of 50. The CDR V DD was adjusted until a specified jitter was measured at the clock inputs of the DSM. A decimation filter implemented as a fixed-function block on a programmable system-on-chip (PSoC) 5 microcontroller [10] downsampled the DSM output to the Nyquist rate. Jitter correction, which is implemented in software using C with critical subroutines written in assembly language, was applied to the decimated DSM output and ran on the same microcontroller as the decimation filter. The CG method is a postprocessing correction technique, and this implementation stored a second's worth of jittered 8-b samples (32 000 samples). The code and data occupied 48 kB of memory. It was empirically determined that clocking the microcontroller at 3.2 MHz (100× the Nyquist rate) was sufficient to observe improved ADC output SNR. Fig. 6 (a)-(c) plots the DSM output power spectral density uncorrected and corrected for normalized CDR output jitter of 0.08%, 3.7%, and 20%, respectively. The SNDR improves by 6.4 dB, 10.1 dB, and 11.9 dB, respectively, for an input tone f in = 1.71 kHz. These improvements are an upper bound on the performance of the CG method because it assumes that the method knows the jitter σ exactly. In a deployed sensor, the jitter could be characterized as a function of supply voltage and stored in nonvolatile memory prior to deployment. Then, the optimal correction value could be chosen based on measuring V DD at run time. Preliminary simulations show that 10% error in estimating the jitter degrades the SNR by < 1 dB. Table II shows the standby power, jitter, and measured best case improvement in the output SNDR, when reconstructing a signal sampled by a clock derived from the CDR output, across V DD . CDR standby power is significantly higher than the optical wake-up receiver reported in [11] , since the 90-nm process has higher leakage and power gating was not implemented. Fig. 7 shows the measured DSM ADC performance versus jitter at various V DD . Fig. 8 plots the power consumption of the software jitter correction on PSoC 5 versus RMS jitter normalized to the clock period, for the external DSM ADC. Jitter correction was also implemented for the PSoC internal ADC using a 5.12-MHz sampling clock with synthetic jitter (f S = 5.12 MHz, f N = 20 kHz, and OSR = 256). This implementation stored two seconds' worth of jittered 8-b samples (20 000 samples). The code and data occupied 32 kB. The increased power in the external ADC case is due to the off-chip universal asynchronous receiver/transmitter interface connecting the test chip and PSoC and the decimation filter. Power scaling assumptions typically indicate that application-specific integrated circuit (ASIC) hardware is 10-1000× more efficient than software implementations for DSP tasks [12] ; thus, an equivalent ASIC implementation could consume as small as 10 μW. An ultralow-power microcontroller, such as the 180-nm CMOS Phoenix processor, operates at 2 MHz at V DD = 0.9 V consuming energy E active = 7.5 pJ/cycle [13] . This corresponds to the 15-μW power consumption shown in the figure. At V DD = 500 mV, Phoenix operates at 100 kHz and could correct jitter for a sampling clock at 1 kHz while dissipating about 300 nW (E active ≈ 3 pJ/cycle).
V. CONCLUSION AND FUTURE WORK
A low-voltage optical CDR circuit designed for operation, from energy harvested from ambient light or power delivered optically and recovered by integrated photovoltaics, has been implemented in 90-nm CMOS and tested across supply voltages from 150 to 500 mV. The CG method implemented in software on an ultralow-power embedded microcontroller can be used to improve ADC SNDR by 6.4-11.9 dB, when the recovered clock, which can suffer from substantial jitter at low voltage, is used to sample the sensor input. Future work will explore the tradeoff between signal reconstruction performance and accurate estimation of the recovered clock jitter.
