Abstract-This paper presents a generic foreground calibration algorithm that estimates and corrects memoryless nonlinear impairments in both single channel and time-interleaved analog-to-digital converters (TIADCs), and which is capable of correcting for amplifier nonlinearity, comparator offsets, and capacitance mismatch for each channel. It operates by generating, and then using, a look-up table which maps raw ADC output decision vectors to linearized output. For TIADCs, the algorithm also uses information gained during the calibration phase to estimate timing and gain mismatches among the sub-ADCs. The problem of selecting an appropriate timing reference so as to relax the requirements on the time-skew correction circuitry is statistically analyzed, as is the corresponding impact on manufacturing yield. Accordingly, a new method is proposed having superior performance; for example, in the case of an eight sub-ADC TIADC system, the proposed scheme reduces the time skew correction requirement by 44% compared with conventional methods. The architecture is instrumented with some additional circuitry to facilitate built-in self-test, allowing manufacturing test time and cost reductions. Implementation aspects are discussed, and several complexity reduction techniques are presented along with synthesis results from a Verilog implementation of the calibration engine.
Sample-and-hold (S/H) nonlinearity is an example of a nonlinear impairment that affects many ADC architectures. A model for the S/H input-output relationship is [1] 
where c i are the nonlinearity coefficients. In [1] , only c 1 and c 2 are estimated, and their effect is compensated through post-processing. In SAR ADCs, capacitor mismatch is an example of an architecture-dependent impairment; it results from variations in the fabrication process [3] . For its calibration, [2] [3] [4] suggest using a linear combination, where the comparator decisions are linearly combined with weights proportional to the capacitance (called ideal weights). In [3] and [4] , these weights are evaluated via histogram measurements.
The work in [11] targets nonlinearity calibration in pipeline ADCs. This calibration requires a sinusoidal input with curve fitting techniques employed to evaluate the error in each output sample. This error is used to update the linear combining weights via a least-mean-square (LMS) adaptation technique. However, to evaluate the input signal parameters accurately for curve fitting, a large number of samples needs to be processed; this requires a large memory size.
As with many calibration algorithms, the above examples compute a set of weights that are then used at run-time (often via a linear combination) to produce the eventual calibrated output. While these can be effective for some specific nonlinearity, a more flexible approach is to use a look-up table (LUT) which can, in principle, correct for any memoryless nonlinearity.
A LUT is used in [12] to map each ADC raw output to its associated integral nonlinearity (INL) error. Using a sinusoidal input, the INL errors manifest themselves as harmonic frequency components. In [12] , the INL errors are evaluated by measuring the amplitude of those harmonic components; however, this requires matrix manipulations, which complicates the implementation for real-time operation.
The above techniques can be applied to correct for nonlinear effects within a single ADC or indeed within each of the sub-ADCs of a time-interleaved ADC (TIADC) architecture. However, there are mismatches among the sub-ADCs that contribute to an additional form of nonlinearity in the aggregated output. The main sources of these mismatches are offset, gain and time skew. These mismatches need to be estimated and compensated for, as they cause spurious components to occur in the aggregated output, thus limiting the overall performance.
As is the case for a single channel ADC, the calibration process for these TIADC-specific mismatches can be either background or foreground. In background calibration, the mismatches are estimated during normal operation by utilizing some hardware redundancy and/or by taking advantage of some assumed statistical signal properties.
In [5] , the background estimation of time skew mismatches is performed by applying a low-complexity Fast Fourier Transform (FFT) on the output samples; however, this method still requires intensive computations and large memory sizes. The cross-correlation between the different sub-ADC outputs is used in [13] [14] [15] [16] [17] , where the output of a bank of correlators is processed to obtain the time skew values.
Hardware redundancy is employed in [6] and [7] , wherein an extra-slow reference ADC is utilized that periodically samples simultaneously with each of the sub-ADCs, enabling estimation of the error associated with that sub-ADC output. The correction side is adapted such that the measured error is minimized. A similar idea is proposed in [8] , where an extra comparator is used instead, and an adaptation process is used to maximize the correlation between the comparator's output and that of the sub-ADC's. One problem with this approach is that, since the extra ADC/comparator does not sample every clock cycle, the sub-ADC is loaded differently when being calibrated, causing its behavior to change during calibration and thus leading to inaccuracies. Also, the periodic changing of the input load can introduce spurs into the ADC output [9] .
In [9] and [10] , both additional hardware redundancy and signal statistics are utilized in time skew calibration. Here the authors aim to minimize the variance of the sub-ADCs' output corresponding to a certain input voltage window. An extrafast flash ADC or comparator is used to mark the samples within that voltage window. However, the relationship between the variance and the time skew becomes very weak for a random input signal, which prevents accurate estimation. Also, these algorithms cannot decide the adaptation direction, and consequently continuous small changes are applied on the correction side, making the convergence speed very slow for high precision time skew correction. This problem also exists in the algorithm proposed in [8] .
In general, background calibration algorithms offer the ability to track slow variations in voltage and temperature (PVT variations). However, their initial convergence can be slow and, since they are running continuously, they deplete the allowable power consumption budget. It is also possible that these algorithms may converge to incorrect values in cases where the signal statistical assumptions are not satisfied [16] , making them unsuitable for general-purpose ADC applications.
For faster convergence and accurate estimation, foreground calibration algorithms may be considered. In these algorithms, a calibration time slot is allocated where a known input signal is applied to the TIADC system. After estimation, the ADC switches to the normal operation mode, where the samples are corrected using the estimated values previously obtained. In general, foreground calibration techniques cannot track PVT variations; however, this can be addressed by occasional recalibration.
A foreground calibration algorithm is used in [18] to estimate the time skew mismatch in a TIADC system, where a linear "ramp" or triangular input signal with known characteristics is injected into the TIADC system; however, in this case noise cannot be removed from the injected signal by filtering as this also affects the linearity of the triangular input [19] . Also, in [20] and [21] the time skew and gain mismatches are obtained through processing off-chip in the frequency domain the output samples of a known input signal.
The estimation of mismatches requires the existence of a reference. Conventionally, this reference can be selected to be 1) one of the main sub-ADCs, e.g., [13] , [14] , [17] , [22] , [23] ; or 2) an extra sub-ADC or a comparator, e.g., [6] [7] [8] [9] , [18] . However, this pre-selection for the reference, especially for the time skew, leads to tightening of the requirements on the correction side, as we illustrate in Section IV-B.
The correction for the offset and gain mismatch is easily achieved digitally using addition and multiplication operations. However, the correction for the time skew mismatch can be done either by analog delay lines before each sub-ADC, e.g., [7] , [14] , or by digital interpolation after the sub-ADCs (using the aggregated output), e.g., [15] , [17] . In the primary application considered in this paper, we use analog delay lines.
In this paper, a unified calibration technique is proposed which works in the existence of several sources of nonlinearity. A sinusoidal input with a known frequency is used during the calibration phase, where both the sampling clock and the input signal are generated using the same signal generator to guarantee their synchronization. Knowing the input signal frequency obviates the complexity of the curve fitting required in [11] . The algorithm populates a LUT which stores the most likely input corresponding to each raw output. The use of this LUT saves post-processing computational power and reduces latency, at the cost of additional storage.
In addition to calibrating the nonlinearity in each individual ADC, the proposed mechanism can be used to estimate the mismatch in TIADCs. The foreground time skew estimation enables proper selection for the reference timing that reduces the complexity on the correction side.
Also, the proposed calibration mechanism can be employed as a built-in self-test to evaluate the ADC performance. With minor extra circuitry, the SNDR for the ADC output can be evaluated without the need for processing the samples externally. This obviates the need to transfer a large amount of data to an external computation unit, which speeds up the production test procedure to be carried out for each fabricated chip, reducing the overall manufacturing cost. This paper is organized as follows, Section II describes the proposed calibration algorithm for a single channel ADC, and its implementation is illustrated in Section III. Section IV extends the proposed algorithm to estimate the mismatches among the sub-ADCs of the TIADC architecture. In Section V, we study the behavior of the proposed algorithm in the presence of multiple nonlinearity sources, and in particular with bandwidth mismatch. The exploitation of the algorithm as a built-in self-test is described in Section VI. Simulation results are presented in Section VII targeting both a single channel ADC and a TIADC system. Finally, conclusions are drawn in Section VIII.
II. PROPOSED CALIBRATION ALGORITHM
In this algorithm, we consider a generic ADC that makes N binary decisions, which form its raw output. This raw output is denoted by an N × 1 vector d, where the n th decision is denoted by d n , and d 0 is the least significant decision. We define the following function to map any possible combination for d to a unique integer,
Without considering any impairments, the nominal ADC output can be calculated according to
where w n is the nominal weight for the n th binary decision, e.g., w n = 2 n for radix-2 SAR ADCs. However, due to circuit impairments, the output of (3) can be an inaccurate representation of the analog input.
The ADC system can be modeled as a memoryless system whose current output is a function of its current input, and it can be described by a transfer function. In the absence of internal noise, the transfer function for an ADC between its analog input v in and digital output θ(d) looks like a staircase as illustrated in Figure 1 . For a given output d, the analog input can be modeled as a sum of 1) a deterministic value, m θ (d) , that represents the average input corresponding to the raw ADC output d as shown in Figure 1 ; and 2) a random quantization noise having a uniform distribution with zero mean, and whose range can vary due to the existence of circuit impairments. The objective of the algorithm is to estimate m θ(d) for all d. Figure 2 shows a block diagram for the calibration engine used in calibrating the nonlinearity of a single ADC channel, where the estimated values for m θ(d) are populated in the LUT. During the calibration, the LUT outputỹ[k] is ignored, and the outputψ can be used to estimate the mismatch among the sub-ADCs for TIADC systems as described in Section IV. Block diagram for the proposed calibration engine, modified from [24] .
After calibration, the ADC may switch to normal operation, where the LUT maps the raw ADC output at the k th time index, denoted by
The proposed calibration algorithm can be divided into two stages that may run in parallel:
1) Input signal synchronization: in this stage, the input signal is characterized in order to compute an approximate digital copyṽ in 
A. Input Signal Synchronization
In the proposed calibration algorithm, a sinusoidal input signal with a known frequency is used. The frequency is selected to be a K f s where f s is the sampling frequency, K is a power of 2, a is an integer such that a and K are relatively prime, and 0 < a < K/2. This analog input can be described as follows,
where A is the input amplitude, φ 0 is the initial phase and g is the ADC internal gain used to convert the input voltage into a digital output. The nominal value for g is denoted byḡ, e.g.,ḡ = 1024/V swing for a 10-bit ADC, where V swing is the input voltage swing. The input amplitude A is configured such that the test signal covers most of the input swing, which increases the number of raw output levels exercised during the calibration. In order to create a digital copy of this analog input, both its initial phase and amplitude need to be evaluated. Let ψ denote the result of the following average over K samples, (8) and (9), modified from [24] .
where (7) holds because the averaging covers an integer number of cycles of the sinusoidal input. The magnitude and the phase of ψ contain information about the input signal amplitude and initial phase respectively, which facilitates the input signal prediction at any time instant. Note that (6) resembles the calculations needed for the a th component of the Discrete Fourier Transform (DFT) of size K.
However, (6) may not be used directly to find ψ, since we do not know v in [k] . Therefore, we substitute v in [k] with the output of the nominal combining in (3); the corresponding estimate of ψ may then be written as
The averaging in (8) relaxes the impact of both the noise and nonlinear impairments, which makesψ an accurate approximation to ψ. Also, this averaging disregards any offset in the output of (3).
B. Building the Look-Up Table
Using (4), (7) and the output of (8), the analog input can be predicted at any time index k according to
where (X) denotes the real part of a complex number X. Note that for the input signal used,
After obtainingψ from (8), another F samples are observed, indexed by k where 0 ≤ k < F. These samples are used to measure the average predicted inputṽ in corresponding to each raw output d. We define a set S (d) for each possible d, containing the time indices when the observed ADC output happens to be d, i.e.,
where
III. IMPLEMENTATION
A suitable hardware realization for the input synchronization block (8) and the prediction block (9) is depicted in Figure 3 , where each of the two depicted multipliers can be implemented as half a complex multiplier. A coordinate rotation digital computer (CORDIC) is used to calculate
K k which is a common term in both (8) and (9) . The averaging in (8) produces an updatedψ every K samples, allowing tracking of any small changes to the input. After obtaining the firstψ, both stages (described in Sections II-A and II-B) can run in parallel to estimate the input voltageṽ in [k] , which is used to update the LUT content as shown in Figure 4 .
The LUT contains 2 N entries each with width N +b, where b is the bit-width allocated for the fractional part. The i th entry in the LUT storesm i , an estimate form
At the start of the calibration process, all entries are initialized to an invalid value ∅, this is in order to mark any non-updated entry during the calibration run.
For the k th input in the second calibration stage, the difference betweenṽ in [k] and the LUT entry contentm θ(d (k) ) can be evaluated,ẽ
whereẽ[k] resembles the instantaneous error associated with the content of the LUT entry with index θ(d (k) ). Using LMS adaptation, this entry can be updated according to:
where 0 < α < 1 is the adaptation step size. The special case (12) is used to enhance the convergence speed, and it occurs only once for each entry. This adaptation process acts as an averaging process similar to (10), but it is suitable for run-time implementation. Note that the adaptation process for each entry in the LUT is independent. Upon the completion of the second calibration stage, there may exist LUT entries which have not been updated, i.e., they remain ∅. This can happen because only a limited number of samples is observed. Those entries are filled using linear interpolation with the aid of the nearest updated entries.
IV. MISMATCH CALIBRATION IN TIADC
The description in the previous sections covers the calibration needed for a single channel ADC. In this section, we extend the application of this algorithm to support mismatch estimation in a TIADC that consists of M sub-ADCs as per Figure 5 . The m th sub-ADC (0≤m≤M −1) is equipped with a variable delay line whose configuration is denoted byτ m . The proposed calibration algorithm in Section II is sufficient to neutralize the effect of offset mismatch sinceṽ in has zero mean; however, other types of mismatch remain and require estimation. Figure 5 shows the block diagram for the proposed mismatch calibration technique, where M calibration engines are running in parallel. Each engine is connected to the output of one of the sub-ADCs; these engines share the same CORDIC block and initialization controller. The output of the 'Input synchronization' block inside the m th calibration engine is denoted byψ m .
A. Mismatch Estimation in TIADC System
Considering the gain and time skew mismatch in the m th sub-ADC, we can rewrite (4) as
where g m is the internal gain of the m th sub-ADC, and τ m is the time skew associated with the m th sub-ADC normalized to the TIADC sampling time. We assume that τ m has a Gaussian distribution with mean zero and standard deviation σ τ .
Using (13) and doing a similar analysis as before, we can write the output of the 'input synchronization' block for the m th calibration engine as
where the phase and magnitude ofψ m are proportional to τ m and g m respectively. All of theψ m are calculated using M K samples, and theseψ m values are used to estimate the gain and time skew mismatches using the calculations given below. Note that the estimations ofψ m are independent from the LUT updating process. Knowing the input signal amplitude A, we evaluateg m , an estimate for g m , according tõ
The gain is compensated after populating the LUT for all sub-ADCs by multiplying the LUT elements byḡ/g m wherē g is the nominal internal gain for the ADC.
Initially, we consider the first sub-ADC with index 0 as a timing reference, and the relative time skew mismatch can be written as
where ∠X denotes the phase angle of the complex number X. The amount of correction applied to the variable delay line connected to the m th sub-ADC can be generalized to take the following formτ
where r is an arbitrarily selected timing reference. However, we need to select r such that the requirements on the correction side are relaxed. The choice of r impacts the range ofτ m that the correction mechanism must support, which in turn will affect the device manufacturing yield. In the following subsections, we examine the impact on yield for various choices of r.
B. Timing Reference Selection
Each sub-ADC is equipped with a correction mechanism, in this case, a variable delay line as shown in Figure 5 . These delay lines can be configured to correct delays within the range ±D. We define η as the target yield, i.e., allτ m values must be within the range of the delay lines for at least a fraction η of the fabricated ADCs. For a certain ADC chip, if any of theseτ m lie outside the correctable range, those values will be saturated, leaving uncompensated time skews that cause performance degradation, and we consider this chip to be corrupted in this case.
Assuming that r is selected to be equal to 0, we need |τ m | < D ∀m ∁ {0, ..., M − 1} for at least η of the chips to satisfy the target yield. From this, we can set the following constraint on D:
where erf −1 (.) is the inverse of the error function. Note that choosing r = 0 is impractical since there is no unique solution for allτ m using the M − 1 measurements obtained from (16) .
Many estimation algorithms choose the timing of the first sub-ADC as a reference, e.g., [13] , [14] , [17] , [22] , [23] . However by (17) , choosing r = τ 0 doubles the variance oḟ
and hence the constraint on D can be written as follows:
Other algorithms employ an extra reference ADC or comparator to act as a timing reference, e.g., [6] [7] [8] [9] , [18] . The variance of theτ m ∀m ∁ {0, ..., M −1} is the same as in (19) ; however, this case yields a tighter design requirement on D, since M constraints need to be satisfied by the valuesτ m . The constraint on D can be expressed as
Note that to find (20) and (21), we assumed that allτ m are statistically independent.
C. Proposed Timing Reference
The proposed time skew estimation technique allows obtaining all the M relative time skew values directly from (16) simultaneously, which facilitates the adjusting of the timing reference such that the constraints on the correction side are reduced. Since we need to reduce D, we suggest to choose r to equal the mid-range among all τ m ,
and the delay lines are correspondingly configured to (from (17))
where τ m − τ 0 ∀m ∁ {0, ..., M − 1} are obtained using (16) . Using (23), the maximum delay line configuration value for a given TIADC can be expressed aṡ
and the minimum delay line configuration value equals −τ max . This choice of r minimizes the maximum value ofτ m ,τ max , for a given set of τ m (see [25] ), thus relaxing the requirements on D.
There is no closed form for the probability distribution function (PDF) ofτ max [26] ; however, using [27] , the PDF of max m (τ m ) (and also of − min m (τ m )) can be approximated to a Gamma distribution, θ g ), where the position parameter c g , shape parameter k g and scale parameter θ g can be written as
) is the inverse of the Gaussian cumulative distribution function (CDF).
Since M is relatively large, we can assume that max m (τ m ) and − min m (τ m ) are independent, and hence by (24) , τ max has a Gamma distribution with position parameter c g , shape parameter 2 k g and scale parameter θ g /2; Figure 6 depicts the CDF forτ max at different M using Monte Carlo simulations and using the approximated distribution in (28) where σ τ = 0.01. It can be observed that this approximation gives an accurate estimation of the CDF for M > 4.
Using the approximation in (28), we can set the following constraint on D to satisfy the target yield when the proposed timing reference is used: where γ −1 (., .) is the inverse lower incomplete gamma function.
As an example, we target a TIADC system with M = 8 sub-ADCs suffering from time skew with a standard deviation σ τ = 0.01, and the target yield is η = 98%. Figure 7 shows the measured yield versus the supported delay line half range D at different timing reference configurations; the yield is measured using Monte Carlo simulations for 1, 000, 000 time skew sets. The figure depicts also the predicted relationship between η and D using the suggested mathematical models. With r = 0, the predicted results from (18) coincide with the simulation. However, (20) and (21) failed to predict accurately the relationship at low yield due to ignoring the dependency among theτ m . Table I lists the required D to satisfy the target yield for different configurations of r. It can be noticed that selecting r as proposed in (22) helps in reducing the constraint on D by 44% compared to conventional methods (to D = 0.0236). Using this value of D, the yield is limited to 60% for the algorithms that use r = τ 0 .
It is worth mentioning here that minimizing the value oḟ τ max helps also to improve the performance in applications that employ digital correction for time skew. Those techniques exploit approximations to simplify the reconstruction of time skew error free samples. The accuracy of those approximations is usually degraded for large correction values.
V. MISMATCH ESTIMATION INDEPENDENCE
For the algorithms that target a specific mismatch calibration, it is commonly assumed that the processed samples are free from errors associated with other mismatch types. For example, under this assumption it is required to use offset, bandwidth and gain mismatch-free samples in techniques which estimate the time skew via direct processing of the sample values; this is the case for a wide variety of algorithms, e.g., [6] , [7] , [13] , [14] , [17] , [18] , [22] , [23] . However, this is not the case for the algorithm proposed in this paper.
The averaging process used to estimateψ m in (8) relaxes the effect of static nonlinearity, and removes the absolute offset for each sub-ADC. This makes any processing onψ m independent of both offset mismatch and static nonlinearity.
The existence of gain and time skew manifests itself in the magnitude and the phase ofψ m independently, as seen in (14); this allows extracting the information for both of them simultaneously using (15) and (16) .
Bandwidth mismatch calibration is not covered in this work 1 ; however, in many algorithms available in the literature, its existence may mislead the estimation of the gain and time skew mismatches leading to further performance degradation. This happens because bandwidth mismatch introduces frequency dependent gain g B,m (f in ) and nonlinear phase shift θ B,m (f in ) as follows [28] :
where f in is the input frequency, B is the 3-dB bandwidth of the RC circuit and Δ m is the bandwidth mismatch. Using Taylor series, θ B,m (f in ) can be approximated to
where the approximations that lead to (32) and (33) are viable because Δ m is considered small. The approximation that leads to (34) is weak since f in /B is not usually small enough to make the term in the square brackets in (33) approximately equal to 1; however, it guarantees that the error in the final approximation is less than the phase mismatch introduced due to bandwidth mismatch for f in < B. It can be noticed from (33) and (34) that the effect of the phase shift introduced by the bandwidth mismatch can be approximated to a linear phase shift that can be treated as a time skew mismatch. Compensating this linear phase using the delay lines reduces the amount of the phase error introduced by bandwidth mismatch for f in < B, which offers a partial correction for the bandwidth mismatch.
In principle, the estimation for the linear phase component introduced via bandwidth mismatch can be carried out using normal time estimation process; however, many of the available techniques are sensitive to gain mismatch, e.g., [7] , [13] , [14] , [17] , [18] , [22] , [23] -these techniques are affected by the frequency dependent gain in (30) , which leads to an incorrect time skew estimation that worsens the performance. The use of a background gain calibration can mask this problem for a narrowband input where g B,m (f in ) can be compensated; however, it does not help for wideband signals.
Unlike those techniques, the proposed algorithm estimates the time skew via processing the phase of the input signal without an impact from gain mismatch. This allows successful estimation of the time skew values that include the linear components introduced by the bandwidth mismatch.
VI. BUILT-IN SELF-TEST (BIST)
The estimation ofẽ[k] in (11) enables exploiting the proposed mechanism as a built-in self-test after the calibration process without adding much complexity to the proposed circuit, where we can 1) measure the signal-to-noise-anddistortion ratio (SNDR) for the ADC output, and 2) detect the comparator metastability. Those two applications are illustrated in the following subsections.
A. Measuring SNDR
To measure the SNDR, we assume that the input signal isṽ in [k] , and the noise is the estimatedẽ [k] . With these assumptions, we can measure the power of both the noise and the signal as shown in Figure 8 , and the SNDR can be measured according to
where L is the number of considered samples.
B. Comparator Metastability Detection
Minimizing the comparator metastability rate is another design criterion that needs to be fulfilled. This metastability occurs when the comparator decisions fail to produce a binary output [29] , leading to a large error on the samples which experience such behavior. This consequence is exploited in [29] [30] [31] [32] to detect those events. In [30] and [31] , a very low Fig. 8 .
Suggested in-circuit mechanism to estimate the SNDR and the comparator metastability. frequency input sinusoid is used, where the expected difference between two successive digital outputs is less than 1; the comparator metastability is then detected when the difference is greater than 1. A similar idea is used in [29] and [32] , where a higher input frequency can be used; however, the output is hard decimated such that the expected difference between two successive digital samples after decimation is kept less than 1.
In the proposed calibration algorithm, we estimate a value for the error associated with each sample,ẽ[k], and the metastability condition is detected when |ẽ[k]| > T , where T is a selected threshold. Figure 8 depicts a simple mechanism to detect and handle this event. On detecting comparator metastability, bothṽ in [k] and d (k) are pushed into a firstin-first-out (FIFO) block, the output of the FIFO is read by a slow-running analyzing software that can calculate the comparator metastability rate, and identify systematic errors.
VII. RESULTS
In this section, we verify the proposed algorithm's performance using Matlab simulations targeting a 10-bit differential radix-2 SAR ADC (note however that the proposed algorithm is not limited to this architecture). The ADC unit capacitance suffers from mismatch having a Gaussian distribution with a standard deviation equal to 10% of its nominal value, which is large enough to produce missing codes, large differential nonlinearity (DNL) and a non-monotonic transfer function. To model a realistic ADC, comparator noise is added to limit the ENOB to around 9 bits. Unless otherwise specified, we used the following configurations, α = 2 −3 , K = 2 12 and F = 61440; the total number of processed samples per single sub-ADC is K + F = 2 16 . For a single channel ADC a = 409 is used, and a = 1433 is used for the tests that target a TIADC system. The final ADC output is truncated to N = 10 bits, while the LUT values are stored as N + b = 15 bits.
To demonstrate that the calibration values do not correlate with a specific input frequency, the frequency of the test signal used to evaluate the performance after calibration is randomly selected with a uniform distribution up to the Nyquist rate.
The following subsections present the results targeting a single channel ADC and time-interleaved ADC. Section VII-C reports the results from synthesizing the hardware implementation of the proposed calibration engine.
A. Using a Single Channel ADC
A Monte Carlo simulation is used to verify the proposed algorithm's performance in compensating the capacitor mismatch problem, where 5000 different capacitor sets are used. Fig. 9 . ENOB distribution for SAR ADC with (a) linear combination using ideal weights, (b) without calibration, and (c) proposed calibration algorithm, modified from [24] . Fig. 10 . Evolution of the error signal |ẽ| during the calibration process. Fig. 11 . Distribution of the error in the estimated SNDR using (35). Figure 9 shows the ENOB distributions when 1) a linear combination with weights proportional to the capacitance values (ideal weights) is used to obtain the output, 2) no calibration is used, and 3) the proposed calibration algorithm is used. The measured average ENOB are 8.97, 7.90 and 8.99 bits for the aforementioned configurations respectively. Note that we used different ranges for the ENOB axis in the results depicted in Figure 9 .
Using one of the described tests, the moving average over 128 measurements for the error signal |ẽ| is depicted in Figure 10 , where the non-updated LUT entries are replaced by their nominal values to evaluateẽ only for demonstration purposes. In this simulation, the system converges within 20, 000 samples.
The SNDR after calibration is evaluated using (35) and using the sinad Matlab function; the distribution of the difference between the two results is depicted in Figure 11 showing an error that is limited to below 0.3dB.
In SAR ADCs, a settling time τ s is allocated after each capacitor switch. This permits the voltage presented to the Measured ENOB with varying S/H nonlinearity coefficients, modified from [24] .
comparator to settle with an RC time constant τ rc . If τ s is insufficient, errors may occur [3] . Increasing the sampling frequency (i.e., reducing τ s /τ rc ) exacerbates this issue. Since the foreground calibration is done using the same sampling frequency as in normal operation, the LUT valuesm θ(d) are tailored to the appropriate value of τ s /τ rc , which aids in providing a better performance compared to the linear combination based approach. However, the proposed algorithm is not able to avoid the problem of missing codes which occurs for sufficiently small values of τ s /τ rc . Figure 12 compares the measured ENOB obtained using the proposed algorithm with that obtained using the linear combination with ideal weights. It can be observed that the degradation of the ENOB due to reducing τ s /τ rc is smaller when the proposed algorithm is used. This allows an increase in the sampling frequency with a minor loss in performance. Figure 13 shows the measured ENOB on changing the S/H nonlinearity coefficients c 2 and c 3 in (1). It can be observed that the performance after calibration has a far greater immunity to the nonlinearity of the S/H.
B. Using Time-Interleaved ADC
Here, the same ADC configurations are used to form a TIADC system with M = 8 sub-ADCs and aggregated sampling rate F s . The system suffers from time skew mismatch having a Gaussian distribution with standard deviation σ τ = 0.01.
In the first test, we target time skew calibration only. Here, we use K = 2 10 , and M K = 2 13 samples are processed to evaluate allψ m needed for the estimation. Figure 14 depicts the average measured SNDR and SFDR before and after calibration at different testing input frequency; each point in the figure is a result of averaging the evaluated performance over 25 tests. Without calibration, continuous performance degradation occurs on increasing the input frequency. On the other hand, the evaluated SNDR after calibration is maintained around 56dB which corresponds to 9 bits ENOB. The measured SNDR around the Nyquist frequency is 55.7dB. The observed SFDR degradation is due to the remaining uncompensated time skew, which can be minimized by increasing K and enhancing the delay line resolution.
The root mean square (RMS) value of the uncompensated time skew (or time skew residue) can be calculated as
where the term in the square brackets is used to compensate the global delay. Figure 15 shows the evaluated distribution for ε before and after calibration. On average, ε is reduced from 9.3×10 −3 to 1.7×10 −4 , which is an indication for successful time skew estimation. Also, Figure 16 compares the distribution of the maximum configuredτ m for each test considered in the results depicted in Figure 14 and the approximated distribution suggested in (28) . In those tests, max m (|τ m |) < D = 0.0236 in 791 out of 800 tests which conforms with the target yield η = 98%.
In addition to time skew mismatch, the previous test was repeated where the TIADC system suffers from capacitor mismatch. Further offset and gain errors for each sub-ADC are assigned, where the standard deviation for the applied offset and gain error are 0.5% of the full input signal swing and 1% of the nominal gain respectively. The full proposed calibration algorithm is carried out for both TIADC mismatches and nonlinearities of the sub-ADCs, where K = 2 12 . The total number of samples used for the full calibration is M (K + F ) = 2 19 ; only the last M K samples are used to estimate the gain and time skew. Figure 17 shows the evaluated distribution for the ENOB before and after calibration. The ENOB is improved from 5.47 to 8.98 bits on average.
In the following tests, we study the effect of the existence of bandwidth mismatch on the proposed algorithm, the sample and hold circuit is modeled as in [28] with nominal bandwidth B = F s , the bandwidth suffers from mismatch having a standard deviation equal to 0.5% of its nominal value. The results of the following four test setups are presented in Figure 18 , where each point in the figure is obtained by measuring the average ENOB over However, a minor performance degradation is noticeable at low frequencies due to skewing the estimated values by the introduced frequency dependent gain and the nonlinear phase mismatches.
4) With bandwidth and other mismatches, after calibration:
In this test the TIADC system suffers from offset, gain, time skew, bandwidth and capacitor mismatches. The performance after calibration is maintained similar to the results obtained in the previous test setup.
C. Hardware Implementation and Comparison of Estimation Techniques
A Verilog model was implemented for the calibration engine using the specifications reported at the beginning of this section. To save area, we decided to implement the LUT as a single port memory, where the LUT is updated only using the even-indexed samples. The hardware design was verified against a fixed-point Matlab model to be bit-accurate using the Cadence Incisive simulator. The design was synthesized using the Cadence Genus synthesis tool targeting a 250MHz clock using a TSMC 28nm HPM process. The design occupies an area of 13, 388μm 2 which is dominated by the single port memory 2 as shown in the area utilization breakdown in Table II . Note that the required memory size increases exponentially with the ADC resolution, which makes the proposed calibration technique unsuitable for high resolution ADCs.
A gate level simulation was successfully run, all internal signal waveforms were dumped into a Value-Change Dump (VCD) file, which is used to collect the switching activity for each net in the design, allowing an accurate power estimation. During the foreground calibration, the design consumes 2.8mW operating on a 250MHz clock. Table II reports the power breakdown. In normal operation mode, only the LUT is active which consumes 0.59mW. Table III provides a summary comparison of the proposed algorithm with other state-of-the-art algorithms [2] , [7] [8] [9] , [11] , [15] .
VIII. CONCLUSION
In this paper, a generic foreground calibration technique has been presented for high speed ADCs with low to medium resolution, which calibrates various nonlinearity sources for a single channel ADC and TIADC. With the use of a sinusoidal input, the calibrated output corresponding to each raw output is evaluated and stored in a LUT. This technique obviates the calibration post-processing, substituting it with a memory read access. Various design simplifications were introduced to facilitate a real-time hardware implementation. In addition to calibrating the nonlinearity of each sub-ADC in a TIADC system, the algorithm can be used to estimate the mismatches in this system independently, and it also offers a partial calibration for bandwidth mismatch. We proposed to choose the timing reference for the estimated time skews such that the mid-range of the estimated time skews is zero; this choice reduces the requirements on the correction side by 44% compared to conventional methods. In addition to calibration, the proposed mechanism can be used as a built-in self-test to detect the comparator metastability and to evaluate the SNDR performance without transferring a large amount of data outside the ADC. A SAR ADC model was used to verify the algorithm's performance. Compared to the linear combining approach using ideal weights, the algorithm showed superior capacitor mismatch calibration, increased tolerance to settling time reduction and significant improvements in the presence of high order nonlinear terms. The area utilization and power consumption for the calibration engine were reported, which demonstrates the feasibility of the proposed algorithm implementation.
