This paper presents an all-digital background calibration technique for the time skew mismatch in time-interleaved ADCs (TIADCs). The technique jointly estimates all of the time skew values by processing the outputs of a bank of correlators. A low-complexity sampling sequence intervention technique, suitable for successive approximation register (SAR) ADC architectures, is proposed to overcome the limitations associated with blind estimation. A two-stage digital correction mechanism based on the Taylor series is proposed to satisfy the target high-precision correction. A quantitative study is performed regarding the requirements imposed on the digital correction circuit in order to satisfy the target performance and yield, and a corresponding filter design method is proposed, which is tailored to meet these requirements. Mitchell's logarithmic multiplier is adopted for the implementation of the principal multipliers in both the estimation and correction mechanisms, leading to a 25% area and power reduction in the estimation circuit. The proposed calibration is synthesized using a TSMC 28-nm HPL process targeting a 2.4-GHz sampling frequency for an eight-sub-ADC system. The calibration block occupies 0.03 mm 2 and consumes 11 mW. The algorithm maintains the SNDR above 65 dB for a sinusoidal input within the target bandwidth.
sub-ADCs. This creates the need to compensate the effect of these mismatches through calibration.
The main sources of the mismatches include offset, gain, time skew and bandwidth, which occur due to process, voltage and temperature variations [1] . In this work we focus on background digital calibration for time skew mismatch under the assumption that the input samples to this calibration process are free from other sources of mismatch.
Estimating the time skew via sensing the cross-correlation between the sub-ADCs output is introduced in [2] targeting a TIADC system with two sub-ADCs. Since then, many algorithms have been proposed to extend this idea to a larger number of sub-ADCs, e.g., [1] , [3] [4] [5] [6] [7] [8] . In [3] , [4] , [6] , the number of sub-ADCs is required to be a power of 2, and the calibration process is done over a number of stages, where the selected reference ADC(s) are changed during those stages. In all stages except the final one, the cross-correlation is measured between non-consecutive sub-ADCs, and the proportionality constant linking to the time skew can be either positive or negative depending on the input signal characteristics, i.e., the adaptation direction cannot be identified directly, which adds a burden on the estimation algorithm to determine it.
The time skew mismatch can be corrected in digital domain where finite impulse response (FIR) filters are usually used, as they can have a linear phase response. The filter coefficients can be time skew dependent, for example knowing the time skew values, the filter coefficients are obtained using Neville's algorithm in [9] , or using the coefficients of an all pass filter with a fractional delay in [2] , also, an adaptive filter system is used in [10] to obtain those coefficients. Since both the coefficients and the input are changing during run time for these approaches, the implementation of the filter multipliers is inefficient, which affects both area utilization and power consumption; also, obtaining these coefficients adds extra overhead to the circuit.
Differentiator FIR filters with constant coefficients are suggested in [5] , [8] , [11] , [12] to guarantee the efficiency of the filter implementation, where these approaches are based on the Taylor series. However, using the Taylor series requires the filter input to be uniformly sampled, which is not satisfied in the existence of time skew; this degrades the system performance especially at large input frequencies, as will be described later in Subsection V-A.
The time skew calibration can be either [8] feed-forward (open-loop) with digital correction, e.g., [5] , [7] , [8] , or feedbackward (closed-loop), e.g., [1] , [2] , [4] , [9] , [10] , [13] [14] [15] [16] . In the former, the estimation mechanism uses uncalibrated samples, which allows to avoid the possibility of system instability in the feed-backward approaches; however, it requires high-precision calculations including divisions and multiplications. Also, the estimated values are inaccurate in the existence of large time skew values due to ignoring the second and higher order effects of the time skew mismatch. The feed-forward algorithms can offer fast estimation for simple sinusoidal input signals; however, their convergence speed can be comparable with other feed-backward calibration algorithms when a practical band-limited random input is used, where the randomness of the input decreases the precision of the estimates as we illustrate in Section VIII.
In feed-backward calibration, the time skew estimation mechanism is fed by the calibrated samples, and the estimates are updated via measuring the uncompensated time skews residues. Unlike feed-forward calibration, feed-backward calibration is not sensitive to the second and higher order effects of the time skew, since the residues diminish on convergence; this allows introducing further approximation to the estimation mechanism without affecting the overall performance. An example for an approximation in feed-backward calibration is to replace each multiplier in the cross-correlation based estimation by a mean-absolute difference [4] .
In this paper, we propose an high precision blind calibration algorithm for time skew mismatch in TIADCs consisting of all digital estimation and correction blocks arranged in a closed-loop architecture. The estimation block performs joint estimation of all of the time skew values using the measured cross-correlations between the outputs of adjacent sub-ADCs. In contrast to other methods, our scheme explicitly estimates the slope of the autocorrelation function, facilitating fast convergence and making the feed-backward mechanism less susceptible to stability issues. Additionally, we outline some of the problems associated with blind estimation in general and propose a novel sampling sequence intervention method, suitable for SAR ADCs, that mitigates many of these problems. The proposed correction block is a two stage Taylor series approach composed of a low-complexity coarse correction followed by a fine correction stage. We develop a statistical approach to analyze the various error terms in the correction mechanism, and these are then used to constrain the parameters of the design in order to achieve the desired bandwidth and manufacturing yield whilst ensuring that any resulting distortions are below the quantization noise level. Incorporated into this methodology is a fixed point filter design approach that ensures the constraints are met. Simplified hardware implementations for the principal multipliers in both the estimation and correction sides are also presented.
This paper is organized as follows. Section II provides an overview of the proposed calibration algorithm. Section III describes in detail the blind time skew estimation mechanism. A sampling intervention mechanism is proposed in Section IV to relax the limitations associated with blind estimation. Section V highlights the proposed digital correction technique, and Section VI quantifies the constraints on the filters used in the digital correction to satisfy the target yield and performance. In Section VII, simplified hardware implementations for the main multipliers used in both the correction and estimation parts are proposed. Simulation results are presented in Section VIII, accompanied by area utilization and power consumption results for the hardware implementation. Finally, conclusions are drawn in Section IX. Figure 1 shows a block diagram for the proposed time skew calibration algorithm for a TIADC system that has an aggregated sampling rate f s , and consists of M slower sub-ADCs each with sampling rate f s /M . Let x(t) be the analogue input to the TIADC system, which is sampled every T s 1/f s [sec], and y (m) n denote the n th output of the m th sub-ADC (m = 0, 1, ..., M − 1). Assuming that the system suffers only from time skew, we have
II. TIME SKEW CALIBRATION
where τ m is the time skew associated with the m th sub-ADC normalized to T s , and y[.] is the aggregated discrete-time ADC output. Each time skew τ m can be modeled using a Gaussian distribution with mean zero and standard deviation σ τ based on the manufacturing process. The ADC output y (m) n is processed by the 'Time skew digital correction' block depicted in Figure 1 to compensate the estimated time skew mismatch, which is denoted byτ m for the m th sub-ADC. Assuming an ideal application of the estimatesτ m , the correction block output can be written as
where
is the time skew residue for the m th sub-ADC, and r is a timing reference. The timing reference r is imposed by the calibration mechanism. Its value can be selected arbitrarily; however, as described in Subsection III-C, a constraint is placed on its selection in order to relax the requirements on the correction mechanism. The corrected samplesx (m) n are processed by the 'Time skew residue estimation' block to produceΔ m , an estimate of Δ m . UsingΔ m , we update the estimated time skew viã
where allτ m are initialized to zero at the start of the calibration, and μ is the least-mean-squares (LMS) adaptation step size which is chosen to be less than 1 to avoid system instability.
The estimated time skew valuesτ m are fed back to the 'Time skew digital correction' block forming a closed-loop. Upon convergence, all Δ m become small, and theτ m become close to the valuesτ m , which, using (3), are given bẏ
While the exposition throughout this paper is general, we consider a particular TIADC system for illustration purposes in our discussions. This TIADC system consists of M = 8 SAR sub-ADCs with N = 12-bit resolution. The target ADC bandwidth is β = 88% of the Nyquist frequency. The system suffers from time skew mismatch having a Gaussian distribution with standard deviation σ τ = 0.01. The target yield is η = 98%, i.e., all performance constraints are required to be satisfied for at least η of the fabricated ADCs.
III. TIME SKEW ESTIMATION
For ease of notation, we definex
, where E(X) is the expectation of X. We compute M estimates of R xx (τ ) in the vicinity of τ = T s , each denoted by c m , as follows
where L is selected to be a large integer, and we have used (2) to formulate (7) . Comparing (7) to the definition of R xx (τ ), we can approximate c m to
This approximation is due to the discrete and finite nature of the summation. It is assumed for this work that the input signal, x(t), is such that the approximation holds. This is the case for overwhelmingly many input signals; however, as outlined in Section IV, there are some exceptions. Using the Taylor series expansion for the right-hand side of (8) around T s , a first-order approximation for c m is given by
where dRxx(Ts) dτ is the derivative of the autocorrelation function evaluated at τ = T s . Using (9) , the differences between adjacent c m are given by
Here, we can notice that e m does not depend only on Δ m , but also on Δ m−1 and Δ m+1 ; therefore, we should not attempt to use e m to adapt Δ m directly. A similar observation was made independently in [8] , [16] and [1] .
Writing (11) in matrix form, we obtain
where e {e m } and Δ {Δ m } are M ×1 vectors, and U is an M ×M circulant matrix given by
Given the observation vector e, we seek Δ satisfying (12), we elaborate on the process used to achieve this target in the following subsections.
A. Estimation of T s dRxx(Ts) dτ
Many calibration algorithms (e.g., those proposed in [4] , [15] ) ignore the estimation of the term dRxx(Ts) dτ , assuming only that this term is always negative, which is correct for a band-limited input; hence, this term does not affect the adaptation direction of the estimation process. However, it makes the convergence speed dependent on the input signal. Also, its estimation is required to guarantee the stability of the calibration mechanism [8] given that μ in (4) is less than 1. This term can be written as [1] 
where x'(.) is the first order time derivative of the input. For the applications that do not use digital correction, an approximated value for x'(t − T s ) can be simply calculated using a subtractor as in [1] . However, the signal derivative is calculated with high precision for time skew correction as illustrated in Section V, which enables us to estimate T s
where y' [.] is the output of the first order differentiator filter in the correction mechanism (depicted in Figure 5 ). Here, X 2 denotes the smallest power of two greater than or equal to X; this approximation replaces the division needed to evaluate Δ by a simple bit-shift operation. Note that this simplification is feasible since a feed-backward calibration is used. 
B. Overall Solution
Using (12), we obtain a solution forΔ, an estimate of Δ, according toΔ
where U † is the precomputed pseudo inverse of U that can be obtained using the computations in Appendix A. This estimate is then used to adapt eachτ m using (4). Figure 2 shows a block diagram of the proposed time skew estimation. It consists of M + 1 correlators; the first M of these are used to compute e using (6) and (10) , and the M th correlator is used in estimating T s dRxx(Ts) dτ according to (15) . All e are processed jointly to estimate the time skew residuesΔ via (16) .
C. Timing Reference
In Appendix A, we showed that the computedΔ via (16) has a zero mean, i.e.,
This fact, when considered in the context of the update equation (4), allows us to further conclude that the sum of theτ m remains constant throughout the adaptation process.
In particular, since allτ m are initialized to zero, it always holds during the adaptation process that
Summing (3) for all m ∈ {0, . . . , M − 1} while replacing Δ m with its estimateΔ m and using (17) and (18) , we find that the pseudo inverse procedure indirectly implies the constraint
Thus by (5), we havė
which has mean zero, and its variance can be expressed as
Many estimation algorithms, e.g., [2] [3] [4] , [11] , [14] , [15] , [17] [18] [19] [20] , choose the first sub-ADC as a timing reference, i.e., r = τ 0 , which, recalling (5) , results in the following variance
Compared to this work, these approaches require a larger dynamic range for the time skew correction circuits, and impact adversely the digital correction accuracy due to the reduction in the reliability of the used approximations for large time skew values.
IV. SAMPLING SEQUENCE INTERVENTION
In blind estimation techniques, it is generally assumed that the input signal is wide sense stationary, and the statistical characteristics for each sub-ADC output are the same in the absence of mismatches. This assumption is used in many algorithms available in the literature, e.g., [2] [3] [4] [5] , [8] , [14] [15] [16] , which is valid for a wide range of input signals; however, it is inaccurate for a number of 'pathological' input types, here, we list two [1]: 1) An input signal containing multiple frequency components, some of which appear in the frequency locations of the spurs of any of the input components. In this case, the blind estimator cannot differentiate between the input signal and the spurs generated by the time skew, and the statistical characteristics for each sub-ADC output are different even in the absence of any mismatches. Note that this situation can occur for any arbitrary frequency. 2) A special case for the previously discussed pathological input type can occur when the input signal contains one or more components at any of the frequencies k fs 2M for k ∈ {0, ..., M −1}, where the spurs of those components coincide with themselves. In [2] and [14] , a notch filter before the estimation mechanism is used to remove these components and avoid this problem. However, this solution is not suitable for the generalized pathological input type previously discussed. Blind estimation techniques can be misled when the input signal contains components that resemble these pathological signal types, which causes performance degradation. To demonstrate this limitation, we used an M = 2 TIADC system with an input signal consisting of two amplitude modulated (AM) channels whose carrier frequencies are located at 373 512 π and 307 512 π. Figure 3a depicts the output power spectral density (PSD) after using the proposed calibration method, where we can notice that the unwanted spurs are brought down to the noise floor. However, after adding another AM channel with carrier frequency 139 512 π (which is at the spur location of 373 512 π), the time skew calibration causes significant performance degradation as can be seen in Figure 3b . Note that, due to the incorrect estimation, there is an in-band interference in the AM channels in addition to the visible high spurs in Figure 3b . This problem decreases the robustness of the blind estimation techniques, thus making them unsuitable for applications requiring general-purpose ADCs. Figure 4a shows the timing diagram for the conventional sampling sequence with M = 4. In the presence of problematic components, the statistical characteristics for each sub-ADC's output can be different, which misleads the blind estimation. This limitation can be relaxed by letting each sub-ADC experiences input samples that would be sampled by other sub-ADCs if the conventional sampling sequence was used. To achieve this with only a minor penalty in area and power, we introduce a minor intervention to the TIADC's sampling sequence, whereby the 'analogue de-multiplexer' and the 'sampling controller' blocks shown in Figure 1 both skip a sub-ADC every L aggregated samples [1] . This change is particularly suitable for TIADC systems with a ring divider clocking architecture, e.g., [13] , [21] , where altering the ring divider circuit enables changing the sampling sequence without affecting the time skew mismatch. Figure 4b shows the corresponding timing diagram using the proposed sampling sequence intervention with L = 11, where a complete conversion cycle is required to be finished within (M − 1)T s , thus tightening the constraints on the analogue circuit. It can be noticed that the samples with indices greater than L − 1 and up to M L − 1 are sampled by different sub-ADCs when the sampling sequences depicted in Figures 4a and 4b are compared. Note that the total number of samples needed to estimate all c m using (6) is M L, and it contains exactly M intervention events.
However, in this work we target a SAR sub-ADC architecture, where the time of a complete conversion cycle is divided into smaller periods to resolve each decision bit. That allows forcing an early termination for the conversion cycle without losing the previously acquired decision bits. Through exploiting this early termination, we can design the sub-ADCs to perform a complete conversion within M T s , and the conversion time is shortened by T s when a sampling sequence intervention occurs. This period corresponds to 1/M of the complete conversion time, i.e., approximately N/M bits are lost in each early terminated sample. Figure 4c shows the proposed timing diagram when early termination is used. Each skipping event leads to forcing an early termination for M − 1 samples, which are partially shaded in Figure 4c .
Note that for a large M , only a small fraction of the ADC output word is missed when early termination is forced. In practice, L is selected to be a large integer, e.g., L = 2 13 is used in the results presented in Section VIII. This makes the overall performance degradation due to the early termination events negligible.
V. DIGITAL CORRECTION
For the purpose of the development and analysis of the digital correction algorithm, we assume that the time skew estimation has correctly converged, i.e.,τ m =τ m = τ m − r. In this case, we can reconstruct a time skew corrected version ofx from the ADC output y using a Q-term Taylor series approximation [11] , [12] x
where * is the convolution operator,τ [n] τ (n mod M) , 1 and h d,q is the impulse response of a non-causal 2 (zero delay) q th order differentiator filter whose ideal frequency response is
where w ∈ [0, π] is the normalized angular frequency. The approximation in (23) is due to the following reasons: 1) in the presence of time skew mismatch, the input is not uniformly sampled; 2) the truncation of the Taylor series to finitely Q terms; 3) truncation of the q th order differentiator filter to L q coefficients, which leads to deviations from (24) . For example, the authors of [11] used (for odd L 1 )
In the following subsections, we propose a two-stage digital correction architecture to reduce the effect of the non-uniform sampling, and we investigate the impact of the choice of Q.
A. Two-Stage Correction
Figure 5a depicts 3 the conventional digital correction mechanism. In Appendix B, we show that some distortion is introduced to the output of this mechanism due to the non-uniformly sampled input. To reduce this distortion, we propose a two-stage digital correction process, where first a coarse Taylor series based correction is applied to reduce the time skews seen by a second, more accurate, fine correction stage as shown in Figure 5b .
The coarse correction stage consists of a simplified first-order differentiator filter, h d,c , implemented as an anti-symmetric FIR structure guaranteeing a purely imaginary frequency response, H d,c (w), i.e., it has an ideal phase response, and it only deviates from the ideal in its magnitude response.
In Appendix B, we showed that the inclusion of the coarse correction block leads to scaling the distortion by a factor of
jw − H d,c (w) /j is the error 3 The figure does not show the matching delays needed to align the inputs of the addition and multiplication blocks to match the filters' group delay.
in the magnitude response of the filter h d,c . The distortion is scaled down when |ν d, c (w)| < w which is generally satisfied for almost all w as discussed in Section VIII. In this work, we use a 9 tap (each tap represented with 4 bits) coarse differentiator filter.
B. Higher Order Correction Terms
In the previous subsection, we only considered first order correction schemes, essentially we let Q = 1 in (23). However, it can be the case that the higher order error terms can also have a significant impact on the system performance. Motivated by this, we now perform a statistical analysis of the higher order error terms with a view to assessing their impact in meeting the target ADC performance metrics.
For a selected Q in (23), the error in the approximation is dominated by the (Q + 1) th term in the Taylor series. For a full-scale sinusoidal input with frequency w, the error in the m th sub-ADC calibrated output is also sinusoidal and its amplitude can be expressed as
We need to choose Q such that the power of the induced error is below the quantization noise power level for at least η of the sub-ADCs, i.e., P(ε 2 Q /2 < 1/12) ≥ η where P(X) is the probability of an event X. Using the maximum supported input frequency, w = βπ, this may be expressed as
erf(.) is the error function, and we have used the assumption thatτ m is normally distributed with variance given in (21) . Using Q = 1, P(ε 2 1 /2 < 1/12) = 0.56, i.e., only 56% of the sub-ADCs will satisfy the suggested design constraint, which of course lies below the target yield η. Similarly, setting Q = 2, we find that the target performance can be satisfied in 99.996% of the parts.
Based on this calculations, the need for a second order correction scheme is clear, and forms the basis of our proposed calibration scheme in Figure 5c . This scheme is based on the first order correction of Figure 5b with the addition of a second order differentiator, h d,2 , whose output is used to cancel the second order error term.
To reduce the system latency, we suggest to design h d,2 as a single filter instead of using two cascaded first order differentiators as done in [12] . Note that the design requirements on h d,2 are less stringent than on h d,1 , because its output is scaled byτ 2 m /2 compared toτ m on the first order correction branch. Furthermore, h d,2 does not have any phase discontinuities simplifying its design compared to h d,1 . In the next section, we will discuss the design methodology of these filters.
VI. FINE DIFFERENTIATOR FILTER DESIGN
In this section, we provide design methods for the two fine differentiator filters h d,1 and h d,2 such that the required yield η and the target performance can be achieved.
We define ν d,q (w) |H d,q (jw) − H d,q (jw)| to be the error in the magnitude response of the q th order differentiator filter where H d,q is the frequency response of h d,q , andH d,q (jw) is the ideal frequency response defined in (24) . We note that the actual FIR implementation for these filters can have ideal phase response, and only suffers from magnitude error. That is because of using symmetric and anti-symmetric filter coefficients for even and odd q respectively.
We define ξ d,q (w) as the upper bound for ν d,q (w) which guarantees that the power of the total distortion induced in the correction mechanism is less than the quantization noise power for at least a fraction η of the ADC instances, where a sinusoidal signal with frequency |w| ≤ βπ is used as an input.
In the following subsections, we evaluate ξ d,q (w), which will then be used in Subsection VI-C and Appendix C to complete the design of h d,1 and h d,2 . Finally, in Subsection VI-D, we demonstrate the complexity of the correction mechanism for different system specifications.
A. Limiting ν d,1 (w)
Using a full-scale sinusoidal input having frequency |w| ≤ βπ, the distortion at Node A in Figure 5c due to the magnitude response mismatch of h d,1 is a sinusoid with amplitude 2 N −1 ν d,1 (w). Accordingly at Node B, the distortion induced in the corrected output of the m th sub-ADC using (23) has amplitude 2 N −1 ν d,1 (w)τ m . To satisfy the target performance and yield η, we need to limit the average power of this distortion over the M sub-ADCs to be less than the quantization noise power for at least η of the ADCs, i.e.,
where M−1 m=0τ 2 m /σ 2 τ is a random variable that has a chi-squared distribution with M degrees of freedom. Using the inverse of the cumulative distribution function of a chi-squared random variable, we can write (29) as
where Z -1 M/2 (.) is the inverse of the regularized lower incomplete gamma function with shape parameter M/2.
Rearranging (30) and using (21) , we obtain the following design criterion for the differentiator filter
i.e., to satisfy the target yield and performance constraints, we need ν d,1 (w) ≤ ξ d,1 (w). Note that the upper bound ξ d,1 (w) is independent of w. The example specification in Section II requires ν d,1 (w) ≤ 0.0141. We further note that had we used a reference timing r = τ 0 , we would have to replace the 1 − 1 M factor in (31) with 2, as per (22) , tightening the constraint to ν d,1 ≤ 0.0094.
B. Limiting ν d,2 (w)
A similar approach can be used to constrain ν d,2 (w) to limit the total distortion power induced by h d,1 and h d,2 magnitude errors to be lower than the quantization noise power. We let the distortion power budget allocated for the filter h d,2 be
. Then, we can express the required constraint as
which can be reformulated as
where κ is chosen such that P( M−1
C. Filter Design
Trimming the number of taps in h d,1 produces a large error (i.e., ripples) in its magnitude response, violating the constraint in (31). To reduce these ripples, [8] suggests to multiply the coefficients obtained from (25) by a Hanning window. Also, [7] proposes to use a Blackman window for the same purpose.
The Parks-McClellan differentiator design algorithm [22] was used to design the differentiator filters in [12] and [23] . This method minimizes the relative error in the frequency response of the filter, i.e, ν d,1 /|H d,1 |. However, the Parks-McClellan method is suboptimal for the design problem at hand, since in this application we focus on minimizing ν d,1 . Also, the filter response will suffer from further distortion after the obtained coefficients are quantized to a limited number of bits for the hardware realization.
This creates a need for a design method that is tailored to satisfy this application's needs while directly providing a quantized version of the coefficients and satisfying the system level design constraints. This design method is described in Appendix C.
D. Correction Complexity Versus System Specifications
In the previous subsections, we defined the constraints on the proposed correction mechanism for an N -bit TIADC consisting of M sub-ADCs, each suffering from time skew mismatch having a Gaussian distribution with standard deviation σ τ . Those constraints are derived such that for at least a fraction η of the ADC instances, the induced distortion in the corrected output is below the quantization noise for a sinusoidal input within a bandwidth β.
The system specifications, i.e., N , M , η, β and σ τ , affect the order of correction, Q, the number of filter taps, L q , and the coefficient bit-width, W q , for each filter. The correction order Q is chosen such that (27) is satisfied. The first and second derivative filters are designed to satisfy the constraints in (31) and (33) respectively. These filters are designed using the method described in Appendix C.
For our example specification in Section II, we obtain a filter h d,1 with L 1 = 25 and W 1 = 10 (including the sign bit). Figure 6a depicts ν d,1 (w) for a filter h d,1 with L 1 = 25 10-bit taps designed by various methods, we note that the filter obtained using the proposed design method satisfies the design requirement in (31), which is marked as a light gray horizontal line. This is in contrast with the responses obtained using window functions and the Parks-McClellan algorithm, which would not satisfy the design constraint.
Similarly, h d,2 was designed using the proposed design method with L 2 = 5 and W 2 = 4. Figure 6b depicts ν d,2 for this filter.
Note that the constraints in (31) and (33) are tightened with increasing η, complicating the filters' design. Similarly, increasing the bandwidth, β, over which the constraint in (31) is to be applied causes an increase in the filter's complexity due to the discontinuity in its response at w = π, making a 100% bandwidth operation not possible for any reasonable constraint. These phenomena can be observed through the different specification examples in Table I where it is clear that the complexity increases with both η and β.
On the other hand, we note that both constraints in (31) and (33) are relaxed as M increases, thus reducing the complexity (for a fixed T s ). This can also be seen in Table I .
VII. LOW-COMPLEXITY MULTIPLIERS
Each correlator depicted in Figure 2 contains a multiplier, which is known to be both power and area hungry. The output of each of these multipliers is averaged over a large number of samples and is used in a feedback calibration loop, which allows simplifying the multiplier's hardware implementation at the expense of its accuracy [1] .
The multiplication operation of the two unsigned inputs a and b can be converted into
(34) 
where ζ l (.) is a correction function selected in order to give the required logarithmic approximation accuracy, . is the floor operator, and log 2 a can be easily evaluated by looking for the leading '1' in the binary representation of a. Similarly, the exponential operation can be approximated to
where ζ e (.) is a correction function. In Mitchell's logarithmic multiplier [24] , linear interpolation is used for both the exponential and logarithmic approximations, i.e., ζ l (x)= ζ e (x)= x. This is the approximation that our design uses for the multipliers in the correlators indexed 0 through M − 1 in Figure 2 . For the M th correlator we let ζ l (x)=ζ e (x)=0, because only an approximate value for T s dRxx(Ts) dτ is required. Accurate approximations are needed for the three multipliers depicted in Figure 5c , in these cases we use ζ l (x) = x+c l ( 32 x ) and ζ e (x)= x+c e ( 32 x ), where c l ( 32 x ) and c e ( 32 x ) are two hard-coded look-up tables each containing 32 entries. Table II reports the area and power result from synthesizing an unsigned 15bit×15bit multiplier using various approximation functions targeting TSMC 28nm HPL process and 300MHz clock. We can see that the area and power are reduced by 42% and 22% respectively when the linear correction functions is used compared to conventional implementation.
VIII. RESULTS
In this section, we report the simulation results obtained using a fixed-point Matlab model for the proposed calibration algorithm targeting the system specifications detailed in Section II, unless stated otherwise. For the estimation procedure, we use L = 2 13 , i.e., M L = 2 16 samples are processed for each calibration cycle. In the following subsections, three scenarios are considered in turn 1) an ADC system with infinite precision (N → ∞) and perfect time skew estimation. This creates a baseline set of results where the effects of both ADC quantization and time skew estimation errors are ignored. 2) an ADC system with N → ∞ where both the estimation and correction mechanisms are connected to form a closed-loop. 3) a 12-bit ADC with closed-loop calibration.
A. Scenario 1
In this scenario, the ADC has an infinite precision, and we assume that the time skew values are known (known as a "Genie-based" approach). Monte Carlo simulations were run for 10,000 ADC instances with randomly generated time skews and a full-scale sinusoidal input with frequency 7193 2 13 π. Figures 7a and 7b show the resulting histogram for the measured SNDR with a timing reference r =τ and r = τ 0 respectively. For a 12-bit ADC, we target 74dB SNDR performance. 4 We see that when r =τ , the target performance is achieved in 98.7% of the ADC instances; this is in line with the target specification of η = 98%. For r = τ 0 , the target performance is satisfied in only 81.4% of the instances.
To assess any frequency dependency, we repeated the same SNDR measurements for sinusoidal inputs at different frequencies. Additionally, to quantify the significance of each part of the proposed correction mechanism depicted in Figure 5c we simulate the system for the three correction mechanisms in Figure 5 and using the proposed calibration method without the coarse correction stage; the results are presented together in Figure 8a . The frequency range is divided into four regions:
The SNDR satisfies the required level of 74dB within the frequency band of interest (regions R 1 − R 3 ) when the full proposed correction mechanism depicted in Figure 5c is used. There is a correlation between the achieved SNDR and the error of the magnitude response for h d,1 shown in Figure 6a . The local SNDR maximas occur when the error hits the zero level; the corresponding frequencies are marked by light gray triangles in both figures. This is an indication that the error in h d,1 dominates the performance, and both the second order and the non-uniform sampled input distortions are successfully suppressed by h d,2 and h d,c respectively. A minor performance degradation compared to other correction configurations can be observed in the low frequency region marked with a circle in Figure 8a . This degradation occurs because |ν d, c (w)| > w in this frequency range, a possibility that was noted in Subsection V-A.
Looking at the other traces in Figure 8a , we see that in the absence of h d,c and/or h d,2 , a significant performance degradation occurs, and this becomes more pronounced with increasing the frequency. In the range R 1 , the error in h d,1 dominates the performance in all traces. In frequency range R 2 , the correction mechanisms without coarse stage suffer more SNDR degradation, suggesting that the impact of the non-uniform sampled input distortion is larger than all other distortion sources. For the range R 3 , we can notice that the second order distortion correction stage dominates the performance for the correction mechanisms that do not have the filter h d,2 . We may conclude that the proposed architecture depicted in Figure 5c is suitable for ADC designs intended to utilize most of the available Nyquist range.
B. Scenario 2
We compared the results obtained in the previous test depicted in Figure 8a with the results shown in Figure 8b , which are obtained when the estimation-and-correction loop is closed. With the proposed correction mechanism, the SNDR experiences a minor degradation in some of the cases; however, it still meets the target performance. In some other cases, a better SNDR is obtained due to adjusting the estimated time skew to compensate the error in the magnitude response of h d,1 corresponding to the input frequency; however, we consider this to be a false (i.e., non-representative) improvement, because it occurs only for simple input signals with narrow bandwidth. Because of this phenomenon, we present results for a more complex input signal in the next subsection.
We note that there is a large drop in performance when the second order correction stage is removed in this scenario compared to the results obtained with perfect estimation in Figure 8a . From this we conclude that the correction of the second order terms has a significant impact on the accuracy of the time skew estimates.
C. Scenario 3
In this subsection, we verify the proposed estimation and correction closed-loop with a 12-bit ADC where the early terminated samples have 10-bit resolution. To model other types of ADC impairments, Gaussian distributed noise is added to the ADC's input; the additive noise level is selected to limit the ENOB to 11 bits, i.e., the maximum possible SNDR for a sinusoidal input is 68dB. All outputs after calibration are rounded to 12 bits, causing a minor SNDR degradation.
To verify the accuracy of the proposed estimation algorithm, Figure 9 depicts the estimated time skew residueΔ 1 obtained from (16) using a sinusoidal input against the actual value Δ 1 predicted in (3). The estimated values are plotted with and without the proposed HW simplifications in Section VII and the . 2 approximation in (15) . Without these simplifications, Δ 1 is estimated correctly for small values of Δ 1 ; however, it suffers from visible non-linearity at large Δ 1 , this effect is neutralized on convergence as Δ 1 → 0. Using the proposed simplifications,Δ 1 is under-estimated mainly due to the use of . 2 ; however, all estimates are scaled down by the same Fig. 10 .
Measured SNDR using a 12-bit ADC model and a sinusoidal input (a) without calibration, (b) with calibration and normal sampling sequence, (c) with calibration and sampling sequence intervention.
factor, an effect that can be absorbed in the adaptation step size μ in (4). Figure 10 shows the measured average SNDR using a simple sinusoidal input at different frequencies. Figure 10a shows the results before calibration where over 30dB SNDR degradation is noticed at high input frequency. Figure 10b shows the results when the sampling sequence intervention is absent; a dramatic degradation can be noticed at certain frequencies due to the limitations of blind estimation algorithms discussed in Section IV. On the other hand, these limitations are relaxed with the proposed sampling sequence intervention whose results are shown in Figure 10c . The average measured SNDR is approximately 65.5dB over the target bandwidth. The average measured SNDR and SFDR for input frequency around w = βπ are 65.3dB and 82.6 respectively. Figure 11a and 11b depict the average measured SNDR and SFDR, respectively, against the input frequency using different correction mechanisms under the same conditions of the previous test, these measurements being taken after 100 calibration cycles. From these results, we can see that the proposed correction mechanism does not suffer from significant performance degradation on increasing the input frequency, in contrast to conventional digital correction mechanisms based on the Taylor series depicted in Figure 5a .
A more complex input consisting of 64 sinusoids was also used for testing. Figure 12 shows the measured PSD for the TIADC output before and after 50 calibration cycles, where we note that the mismatch spurs are successfully reduced by 32.1dB.
A Monte Carlo simulation with a band-limited random input signal is used; the input consists of independent and identically distributed (i.i.d.) samples, and a low-pass filter is used to limit the signal bandwidth to 0.88π. The test is repeated 1,000 times with different input and time skew parameters; a histogram of the measured SNDR before and after 640 calibration cycles is shown in Figure 13 . On average, the SNDR is improved from 37.2dB to 58.8dB.
To investigate the convergence behavior, we use a sinusoidal input having frequency w = 7193 2 13 π. Figure 14 depicts the evolution of the RMS of the time skew residues and the SNDR during the calibration process where the algorithm converges after 30 calibration cycles. The figure also depicts the performance evolution for a band-limited random input where the calibration algorithm converges after approximately 300 calibration cycles, which is ten times slower compared to the results obtained for a simple sinusoidal input. The slow convergence is due to the fact that it takes many more samples to produce accurate correlation estimates, c m , in the case of a random signal compared to a sinusoidal input. Also, the figure compares the convergence behavior when all multipliers and dividers are implemented with and without the proposed HW simplifications. For a sinusoidal input, the measured SNDR hits 65.2dB after 25 and 20 calibration cycles with and without those simplifications respectively; this difference is mainly due to using the approximation . 2 in (15).
D. Hardware Implementation
A VHDL implementation for the proposed calibration algorithm was carried out, which was verified to be a bit-accurate representation for the Matlab model via simulation. The implementation supports the estimation and the correction for time skew values within ±6στ . The design was synthesized using the Synopsys Design Compiler tool in TSMC 28nm HPL process targeting a 300MHz clock to provide 2.4GS/s aggregated sampling rate. Successful gate level simulation was carried out, allowing to measure the switching activity for each internal signal in the design across two complete calibration cycles. Table III shows the area utilization and power breakdown. The design occupies an area of 0.03mm 2 and consumed 11mW. Without the use of the HW simplifications proposed in Section VII, the estimation circuit occupies 9, 852μm 2 and consumes 3.2mW, i.e., the proposed simplifications enable 25% area and power reduction.
IX. CONCLUSION
In this paper, a digital time skew calibration technique was presented which can be used for a TIADC system with an arbitrary number of sub-ADCs. A novel hardware modification suitable for SAR ADCs is suggested to relax the limitations that face blind estimation techniques. The resulting increased robustness of the estimator comes at the cost of a very minor reduction in the ADC output precision. A two-stage correction mechanism was proposed to satisfy the target high precision correction. A quantitative study was conducted on the requirements imposed on the digital correction to achieve the target performance and yield, and a filter design method was proposed to enforce these requirements. We proposed tailored hardware implementations for the main multipliers in both the correction and estimation sides, which leads to a 25% area and power reduction in the estimation circuit. The proposed calibration method was verified via Matlab using different input signal types. The calibration algorithm maintains the SNDR above 65dB for a sinusoidal input within the target bandwidth, which cannot be achieved via conventional digital correction mechanisms based on the Taylor series. A VHDL model was implemented and synthesized using 28nm HPL TSMC process, targeting a 2.4GHz sampling frequency for an 8 sub-ADC system. The calibration block occupies 0.03mm 2 and consumes 11mW.
ACKNOWLEDGMENT
The authors would like to thank Benjamin Tardivel, Gavin Lacy and João Marques for the fruitful discussion during the algorithm design and implementation.
APPENDIX A INVERSION OF U
Using the unitary M -point discrete Fourier transform (DFT) matrix F having (x, y) entry F x,y = e −j 2πxy M / √ M , the circulant matrix U described in (13) can be diagonalized as
where (.) H is the Hermitian transpose, and Λ diag (λ 0 , λ 1 , . . . , λ M−1 ) is a diagonal matrix of eigenvalues of U, which, recalling (13) , can be computed to be (c.f. [25] )
We note that U is a rank-deficient matrix since λ 0 = 0, and thus U is non-invertible. This non-inevitability was also noted in [1] , [7] , [8] , [16] ; the solution proposed in [7] , [8] involved the removal of a row of U (and of the corresponding value in e) and then applying the Moore-Penrose pseudo inverse formula. In [1] and [16] , it was suggested to force Δ 0 = 0 in order to be able to find a solution. In this work, we apply a more general pseudo inverse methodology that does not involve the removal of any measurement data, nor the application of an unnecessary constraint on any particular τ m . Specifically, we compute U † , the generalized pseudo inverse using the singular-value decomposition (SVD) technique [26] , as
The (x, y)−entry of U † is given by
Since λ k = λ M−k for every k, all elements in U † are real, and from (40) we can note that U † is a symmetric circulant matrix; this also implies that U † can be described using at most M+1 2 distinct values. These features can be exploited to simplify the hardware implementation of the calculations needed in (16) . Note that these features are not present in other algorithms that employ matrix manipulation for time skew estimation, e.g., [1] , [7] , [8] , [16] .
We note that F H acts as an inverse DFT operator, and the inclusion of a zero in the upper-leftmost position of the diagonal matrix in (39) constrains the solution to (16) to have zero mean as concluded in (17) .
Using (40), it can be shown that the elements of U † can be represented as constant rational numbers whose common denominator is at most 12M . As an example, for M = 6 we have 
For the proposed feed-backward calibration algorithm, the denominator in (41) can be approximated to a power of two to simplify the calculations in (16) .
APPENDIX B NON-UNIFORM SAMPLING ANALYSIS
The frequency-dependent nature of the contribution of the non-uniform sampling to the approximation in (23) can be seen by computing the error, compared to the uniform sampling, induced at the output of an ideal first order differentiator filter in response to a sinusoidal input with frequency w and amplitude A,
For simplicity and without loss of generality, we let r = 0, in which case the n th input to the filter can be written as 
where the {w a } and {β a } are the frequencies and phases of the M spurs resulting from the time skew.
1) Using the Conventional One-Stage Correction Mechanism in Figure 5a : The filter h d,1 output at Node A can be expressed as y'[n] = (h d,1 * y) [n] . Using an ideal h d,1 and the approximated input in (43), the signal at Node B can be written as 
Accordingly, the output of the correction mechanism can be written asx
where we observe that the first order error term, e[n], in (43) has been fully removed; however, another error term appears. Figure 5b : The output of h d,c filter for the input described in (42) can be approximated to A (H d,c (w)/j) cos(n w) where H d,c (w) is the filter h d,c purely imaginary frequency response. Thus the input to h d,1 after the coarse correction is approximated bỹ
2) Using the Two-Stage Correction Mechanism in
where ν d, c (w) jw − H d,c (w) /j is the error in the magnitude response of the filter h d,c at w. Usingỹ[n] as an input to the fine correction stage and doing a similar analysis, we can write the corrected output as 
Comparing (48) with (50), we can conclude that the error magnitude is scaled by a factor of |ν d, c (w)|/w.
APPENDIX C FILTER DESIGN
We target the design of a linear phase q th order differentiator filter h d,q with an integer number of samples group delay and zero DC response, necessitating an odd length L q . The result of applying a DFT with size F on the filter h d,q can be written as For an even q, a similar approach can be used to designĥ d,q , except that it is necessary to have an even symmetry around the center tap to enforce the required phase response properties, i.e., we can write the MILP optimization problem as minimizê 
This approach allows us to obtain directly a quantized version of the coefficients using a MILP solver, e.g., the intlinprog tool in Matlab, thus obviating the need to perform a distinct quantization operation on the coefficients.
