Factors that contribute to the rapid increase in power dissipation as a function of input bandwidth in high speed electronic Analog-to-Digital Converters (ADCs) are discussed. We find that the figure of merit (FOM), defined as the energy required per conversion step, increases linearly with bandwidth for high-speed ADCs with moderate to high resolution, or equivalently, the power dissipation increases quadratically. It is shown that by use of photonic time-stretch technique, it is possible to have ADCs in which this FOM remains constant for up to 10 GHz input RF frequency. Using this technique, it is also possible to overcome the barrier to achieving high resolution caused by clock jitter and speed limitations of electronics in such ADCs. Use of optics is actively being pursued for reducing power dissipation and achieving higher data-rates for board-level and chip-level serial communication links. In the same manner, we expect that optics will also help in reducing power dissipation in high-speed ADCs in addition to providing broader bandwidths.
Introduction
Increased bandwidth demands from internet backbones have led researchers to target 100 Gbit/s and higher data rates per wavelength division multiplexing (WDM) channel using spectrally efficient, multilevel modulation formats [1] . Demodulation of such signals requires analog-todigital converters (ADCs) with very high performance and resolution. Such ADCs are also crucial for defense applications such as radars and for wide-bandwidth laboratory instruments such as oscilloscopes and vector spectrum analyzers. While continued scaling of CMOS technology [2] has improved digital circuits tremendously in terms of performance, power efficiency and cost, analog circuits (and ADCs) have not really kept pace. Even though the bandwidth of an analog circuit improves with technology scaling, since smaller devices run faster, thanks to reduced capacitances, power dissipation for the same functionality does not always scale because of lower intrinsic gains in shorter channel CMOS transistors [3] . In fact, most improvements in power efficiency of analog circuits can be attributed to architectural improvements and scaled voltages, rather than reduced capacitances in CMOS devices.
In ADCs, analog full-scale voltage cannot be reduced arbitrarily because of the thermal (kT /C) noise limitations. As a result, while technology scaling has not directly resulted in decreased analog power dissipation, power reduction and increased speed of digital circuits has allowed extensive use of digital correction and calibration techniques, which have led to improvement in performance. The most commonly used figure of merit (FOM) for ADC efficiency is the energy required per conversion step,
where, P diss is the power dissipation of the ADC, ENOB is the effective number of bits and ERBW is the effective resolution bandwidth of the ADC [4] . ERBW is defined as the frequency at which the signal-to-noise-and-distortion ratio (SNDR) of the ADC degrades by 3-dB as compared to its low frequency value. Fig. 1 shows the energy per conversion step as a function of ERBW for ADCs from 2002 to the present. While Fig. 1 lumps together ADCs for 5 or 6 different electronic technologies, it is evident from the plot that above 100-MHz, energy per conversion step for the best ADCs increases rapidly with bandwidth. Similar trend is found from the data obtained in [5] for CMOS technologies for high speed (>100MHz ERBW) and moderate to high (>6 ENOB) resolution ADCs. In reference [6] , Murmann finds that power the efficiency of the ADCs has improved by a factor of 2 every two years, thanks to technology and power supply scaling. However, the data in [5] confirms that this trend is not observed for high speed ADCs with moderate to high resolution.
In this paper we estimate that the power dissipation in such high speed ADCs increases approximately as f rate sampling. On the other hand, if the photonic time-stretch technology is used [7, 8, 9] , we show that one can obtain linear power scaling of future ADCs to much higher frequencies. The conceptual diagram of a Time-Stretch Analog-to-Digital Converter (TS-ADC) is shown in Fig. 2 . The time-stretch technique also allows breaking through the so-called walls in the ENOB-bandwidth plane caused by comparator ambiguity and timing jitter [4, 10] .
In this paper, first we discuss the fundamental limits on noise and power dissipation in ADCs. Second, we discuss frequency scaling in digital circuits and show why similar trends are observed in ADCs. Third, we show that the photonic time-stretch technique can push the linear scaling of power dissipation versus sampling frequency in the ADCs well into the GHz band. Finally, we compare the power dissipation in high speed ADCs with and without the use of the photonic time-stretch technique.
Fundamental limits to power dissipation in ADCs
An analog-to-digital converter consists of two main operational stages. First, the sample and hold stage (S&H) samples the analog signal on a capacitor through a switch at periodic intervals of time. Second, the quantization stage which converts the sampled analog signal using comparators to digital (binary) outputs. Depending on the architecture, there can be a series of these two stages in a specific ADC design. In addition, clock generation circuitry is required to provide clocks with different phases and duty cycles to different sets of switches so that all operations in the ADC are synchronized. Finally, there are reference buffers that act as accurate voltage sources with very low output impedances. Additionally, modern ADCs use digital circuitry extensively for correction and calibration in post-processing, which can also consume a significant fraction of the total power.
In the sample and hold stage, the analog signal and the thermal noise generated in the switch are sampled onto a capacitor with capacitance C. The sampled noise voltage has a mean squared value of kT /C, where, k is the Boltzmann constant and T is the ambient temperature. Note that the total magnitude of thermal noise is sampling frequency independent. This is because the thermal noise at all frequencies is folded back into the Nyquist bandwidth in a sampled system. If the thermal noise is the dominant noise source and V FS is the full scale voltage, we obtain the signal-to-noise ratio and dissipated power as:
Therefore, for a given signal power (or V FS ), thermal (kT /C) noise places a lower limit on the capacitance that can be used in the sample and hold stage. On an average, bias currents required in the buffer amplifiers to charge the sampling capacitors with the signal or the reference voltage are directly proportional to the value of C and the charging time, fundamentally limiting the power dissipation of an ADC, as observed in (3). This limitation can be written in terms of the effective number of bits of the ADC by substituting the standard relation for ENOB as a function of SNR [4, 11] :
In practical circuits, the power dissipation is at least 3 to 4 orders of magnitudes higher than this value [12] since the individual components such as voltage buffers, opamps, comparators, and clock sources have to satisfy requirements of low noise, high linearity, high speed and high precision settling. Currents in all these circuits scale proportionally with the capacitance C for a fixed sampling frequency.
Equation (4) suggests that ADC power dissipation should scale linearly with f s . However, from the trend seen in Fig. 1 and in [13] , and from the discussion in the following subsection, it becomes clear that in reality, the energy per conversion step scales roughly as f s , i.e., power dissipation is proportional to f 2 s in high-speed ADCs.
Power Scaling in Digital Circuits
Power dissipation in digital circuits, which until recently has been dominated by the dynamic power required for switching transistors, is proportional to CV 2 f , where C is the average node capacitance, V is the supply voltage and f is the clock frequency [14] . To run the circuits at fast speeds, high switching currents are required to charge or discharge the node capacitances quickly, which demands a higher supply voltage. The minimum operating voltage V at which a digital circuit can operate correctly is roughly proportional to √ f for a wide range of frequencies or voltages [14] . This implies that if the frequency of operation is reduced by a factor α, the required power decreases by factor α 2 . As a result, the energy-delay product for performing an operation is roughly constant over a wide range of operating frequencies in digital circuits [14] . This simple observation implies that when more delay is allowed for a set of operations, less energy is required to perform them. This implication is the reason that the digital world is moving towards architectures exploiting parallelism [15] , and the same trend is found in high sample rate ADCs and real-time digital oscilloscopes [16, 17, 18, 19, 20] . In this paper, we find that the time-stretch technique, which uses the same approach of parallelism to digitize very high bandwidth signals, can also help in reducing power dissipation in high speed ADCs.
Power Scaling in Analog-to-Digital Converters
Speed considerations: In deriving the expression P diss ∝ kT. f s .SNR, it was assumed that to increase the sampling frequency, the bias currents need to be increased linearly to charge up the capacitors fast with no limitation being posed by the transistor response time. In reality, the unity gain frequency f T of the transistors should also be increased linearly with f s because the operational amplifier (opamp) outputs driving the capacitors have shorter time to stabilize before quantization begins. In the second method, overdrive voltages are kept constant and only the transistor widths are increased linearly for a proportional increase in current. In this case, the intrinsic gain is maintained, but f T does not increase, resulting in an incomplete settling of the opamp outputs for higher sampling frequencies.
In both cases, we find that just by scaling currents proportionally, one cannot fulfill the requirements of faster response time while maintaining the same linearity (and resolution). For quantization process, the voltage comparators also need to switch faster to avoid comparator ambiguity [4] . The same arguments, as discussed above, indicate that increasing drive currents linearly with frequency in comparators is again not a solution for achieving the required comparator speeds. These facts suggest that power dissipation in ADCs should increase more rapidly with frequency, following a somewhat similar trend as in digital circuits. This frequency scaling trend, as shown in Fig. 1 , is found not only in CMOS technologies, but is also central to other technologies like SiGe, GaAs and InP which have traditionally been used for very high speed ADCs.
To overcome these issues in frequency scaling, new ADC architectures employing parallelism must be used as in case of digital circuits [14] . In the time-interleaved architecture, which is a parallel ADC architecture such as the one shown in Fig. 3 , multiple sub-sampling ADCs are used in parallel, to sample the signal at different instants of time within a full sampling clock cycle [16, 17, 18] . The outputs of these "sub-ADCs" are combined in the digital domain and post processing is performed to suppress distortions caused by timing errors, gain mismatches and DC offsets. The front-end of the time interleaved architecture can have a single S&H block [16] feeding all sub-ADCs, or a separate S&H block corresponding to each sub-ADC [17] , or a combination of the these two approaches [18] . In the first and the third case, scalability to high sampling frequencies and to large sub-ADC numbers is still a challenge, and same power considerations, as discussed above for the ADCs, apply to the front-end circuitry. The second architecture (discussed in [17] ) can potentially be scaled to have higher sampling rates, but timing jitter and residual timing offsets, which are discussed in the next section, limit the ADC resolution. Also, this architecture requires a predriver to drive a large capacitive load of S&H blocks which can limit the bandwidth and add significant power dissipation. Fig. 4 . Sampling a signal with and without time-stretch. When time-stretch technique is used, the noise due to clock jitter becomes insignificant, and noise added by the laser jitter dominates, which is typically much less than the electronic clock jitter (V n represents the noise voltage).
Noise due to Aperture Jitter
In high speed ADCs, aperture jitter (or uncertainty) is a significant source of noise [4] (as shown in Fig. 4 ), which is caused by jitter in the sampling clock. Aperture jitter noise is signal frequency dependent which severely degrades the SNR of moderate to high frequency signals. The jitter limited SNR for an rms timing jitter τ j is given by [4, 21] :
Most of the aperture jitter noise is added by the jitter in the sampling clocks generated by the clock sources. In the time interleaved architecture used in [17] , timing errors in clocking different sub-ADCs also add the same effect as jitter and limit the achievable SNR. For example, the 20 GS/s ADC reported in [17] shows a resolution of 6.5 effective bits at low frequencies, but the resolution drops to 4.6 effective bits for 6-GHz RF signal because of an effective rms sampling jitter of 0.7-ps. In another example [21] , an rms jitter of 250-fs is shown to reduce the effective resolution of a 14-bit ADC to about 11.2 effective bits for a 230-MHz RF signal. As discussed in the next section, the time-stretch ADC technique uses optical processing to overcome these limitations, in addition to achieving significant power savings.
Time Stretch Analog-to-Digital Converter
In a time-stretch ADC (TS-ADC), the effective bandwidth and frequency of the RF signal to be digitized is compressed by stretching the signal in time [7, 8, 9] , thereby reducing the bandwidth of the backend electronic digitizer required to capture the original signal. Fig. 5 shows the fundamental process of time-stretch. To do so, the RF signal is modulated over a long pulse of a linearly chirped optical carrier obtained from a super-continuum source (which can be a femto-second mode locked fiber laser). Propagation through a dispersive medium stretches the modulated pulse in time, resulting in a "time-stretched" replica of the original RF signal after photodetection. The magnification or the stretch ratio M is given by (D 2 /D 1 + 1), where D 1 and D 2 are the dispersion values of dispersion fibers DCF-1 and DCF-2, respectively. To achieve continuous operation, the optical spectrum is segmented into multiple channels using a wavelength division multiplexing (WDM) filter. Time-stretched signals from different channels are digitized by separate electronic digitizers and combined together in digital domain. Fig. 2 illustrates one realization of the TS-ADC system that can be used to stretch the signal by up to a factor of four, and requires four channels to capture the whole signal continually.
As illustrated in Fig. 4 , stretching the signal in time using optical preprocessing reduces the effective signal frequency seen by the S&H block by the stretch factor M. As a result, the noise added to the system due to clock jitter is scaled down by M 2 . For example, if a stretch factor of 10 is used in a TS-ADC, the clock jitter limited noise can be lowered by up to 20-dB compared to a conventional ADC. We note that the timing jitter of the mode-locked fiber laser, which is used for generating chirped pulses in the TS-ADC, still adds noise, but it can be reduced to very small values with careful design. For example, a laser with 18-fs rms jitter has been reported in [22] . On the other hand, the best jitter performance achieved by clocks in electronic digitizers is of the order of 200-fs [21] . In reference [23] , the best clock jitter of 180-fs is observed for clocks with very high voltage swings. However, such voltage swings at high speeds add very substantially to power dissipation, and make clock distribution almost impossible in time-interleaved ADCs.
In time-interleaved ADCs, the clock jitter is generally much higher as extensive clock generation circuitry is required to generate multiple clocks with very precise phase delays. Additionally, even after adaptive alignment and calibration, there is a residual timing misalignment between clocks going to different sub-ADCs. Stringent requirements on clock accuracy can thus result in a significant power penalty in the time interleaved ADCs.
As evident from Fig. 4 , when the time-stretch technique is used, the effective sampling jitter in the system can be reduced, and can be written as with sampling frequency for the ADC. In the next sub-section, we consider an example to show how time stretching can be very useful in the context of power savings.
Power calculations for a Time-Stretch ADC
The block diagram of a TS-ADC system for continuous operation is shown in Fig. 6 . We assume that the repetition rate of optical pulses from the mode-locked laser (MLL) is 100-MHz. Therefore, the time segments that need to be captured by electronic digitizers are 10-ns long. Usable optical bandwidth of 40-nm (i.e. 5-THz bandwidth in frequency at 1550-nm center wavelength) can easily be obtained from a femto-second MLL, for example, the FFL1560-MP laser from Precision Photonics, followed by a highly non-linear fiber [24] . For continuous modulation of RF signal, these 40-nm pulses have to be stretched to 10-ns before they are modulated using Mach-Zehnder modulator (MZM), which requires -250 ps/nm dispersion in DCF-1, corresponding to dispersion of about 15-km SSMF (standard single-mode fiber). For the TS-ADC, dispersion compensation fibers (DCFs) are used because they have higher dispersion-to-loss ratios compared to SSMF. Using a DCF, dispersion value of -250ps/nm is achieved with a distributed loss of about 0.65-dB (as found in [25] and from measurements in our lab), to which connector losses are added separately. If the stretch factor is M, the dispersion required in DCF-2 becomes
, where D 1 and D 2 are the dispersion values of DCF-1 and DCF-2, respectively. Also, we estimate the losses in the Mach-Zehnder modulator to be 4-dB and losses in WDM filter, polarization controller and connectors to be an additional 3-dB. Therefore, total loss in the optical link is about M × 0.65 + 4 + 3 = (M × 0.65 + 7)dB. Also we estimate that the power at the input of each photodetector as 1 mW -which gives 58-dB shot noise limited SNR for 500-MHz RF bandwidth, 0.5 modulation depth and 0.8 A/W photodetector responsivity. The noise and SNR calculations for an optical system is shown in Appendix A. Same or better thermal noise from electronics and photodetector can easily be achieved, resulting in better than 58-dB SNR with differential operation [26] . Backend ADCs with 8-ENOB (i.e. 50-dB SNDR) can now be used to obtain same 8-bit resolution, as the additional noise is compensated by differential operation [26] . Noise contribution due to laser RIN (relative intensity noise) is unimportant as a modest RIN of -150-dB yields an SNR of 63-dB in such conditions. The backend digitizers are assumed to capture waveforms at a sample rate of 1-GS/s, with 0.5-pJ/step FOM and 500-MHz Nyquist bandwidth with 8-ENOB -resulting in 125-mW power dissipation. These values are projected using the blue line in Fig. 7 obtained from observing the linear dependence of ADC FOM on f s for published ADCs, and using FOM of the best reported GS/s ADC [18] . For each optical WDM channel, differential and arcsine operations are performed [26, 27] , which not only improve the SNR by 3-dB but also suppress non-linear distortion due to electro-optic modulation and chromatic dispersion. However, this requires that the number of backend digitizers and photo-detectors are twice the stretch factor M (or the number of optical channels). The electrical-to-optical power conversion efficiency of the laser is assumed to be 20% and the power consumed by each photodetector is estimated to be 50-mW. This includes the power required to bias the photodetector and the power in the subsequent amplifying stage to bring the signal to full scale voltage of the electronic digitizer. As a proof of principle, a two channel 7-ENOB TS-ADC with 10-GHz RF bandwidth was recently demonstrated [24] , in which the resolution was primarily limited by the backend digitizer. This is, to the best knowledge of the authors, a world record resolution achieved in digitization of 10-GHz bandwidth signals.
Finally, the combination of channel outputs in digital domain requires signal processing and memory. Even though CMOS scaling has made digital circuits highly power efficient, large amount of digital data is generated, which requires significant power consumption in digital post-processing. Digital power is estimated to be the same as the total power consumed in backend electronic ADCs (as a similar trend is observed in [13] ). Using these numbers with 1-mW input optical power at each photo-detector, the total optical and electrical powers can be calculated. Power scaling obtained in the TS-ADC is plotted as the red curve in Fig. 7 . It is observed that the FOM roughly stays constant up to 5-GHz as power consumption of electronics dominates at lower frequencies.
For higher frequency signals, longer dispersive fibers are required to have larger stretch ratios, which add significant power penalty due to optical losses. Optical amplification using Erbium doped fiber amplifiers, or distributed Raman amplification can be used to curtail these losses and improve the overall power efficiency significantly while maintaining high SNR. However, their discussion has been avoided here for simplicity. Furthermore, with lower loss dispersive media, such as chirped fiber Bragg gratings (CFBGs), linear power scaling trend can continue to much larger bandwidths, as shown by the green curve in Fig. 7 . In these calculations, the losses in CFBGs are assumed to be half of the DCFs, though in actual, the CFBG losses are even lower [28] . However, the CFBGs can have a significant group delay ripple, which must be reduced or corrected in high resolution applications. Since there is practically
Conclusion
In this paper, we showed that the photonic time-stretch technique can be used to scale electronic ADCs to higher frequencies both by reducing power dissipation and by overcoming the SNR barrier added by electronic clock jitter and the limited speed of electronics. In addition, use of optics provides several other advantages. Optics has traditionally been very useful in transmitting very wide bandwidth analog and digital signals over large distances with low losses. In particular, transmission over optical fibers is widely used for routing analog signals from antennas at remote locations to base stations for signal processing. The TS-ADC can also be very useful in this aspect, since no additional hardware is required to provide the option of remoting in communications and radar systems. In this mode of operation, the dispersive fiber that stretches the RF signal also serves as a fiber link.
Optical subsystems have been proposed for signal transmission at board levels [31] and chip levels [32] to reduce power dissipation and achieve high throughput rates. In light of this ongoing opto-electronic integration, we believe that the photonic time stretch technology has also become very important, and can be integrated with CMOS technology in near future.
Appendix A: Noise contributed by different optical frontend components
First we define the parameters used in the equations: P in = Average optical power at photodetector input m = Amplitude modulation index B e = Electrical bandwidth after stretching R = Electrical impedance (50-ohm) η = Photodetector responsivity i n = Rms noise current q = Electron charge T = Ambient temperature k = Boltzmann's constant RIN = Relative intensity noise of the laser in decibels P NEP = Photodetector noise equivalent power (typically ∼15-pW/ √ Hz).
Because of quantum nature of light, the photodetector generates shot noise current with its variance given by i 2 n,shot = 2qηP in B e .
The noise contribution due to laser relative intensity noise (RIN) in differential operation is 
Thermal noise contribution of the photodetectors is given by i 2 n,thermal = 4kT B e /R + (ηP NEP ) 2 B e .
With these three major noise contributors, and differential signaling, the total signal-to-noise (SNR) ratio in the stretched RF signal received from the time-stretch preprocessor is obtained as: 
