In this paper, we propose a novel image calibration algorithm for a twofold time-interleaved DAC (TIDAC). The algorithm is based on simulated annealing, which is often used in the field of machine learning to solve derivative-free optimization (DFO) problems. The digital-to-analog converter (DAC) under consideration is part of a digital transceiver core that contains a high speed analog-to-digital converter (ADC), microcontroller, and digital control via a Serial Peripheral Interface (SPI). These are used as tools for designing an algorithm which suppresses the interleave image to the noise floor. The algorithm is supported with experimental results in silicon on a 10-bit twofold TIDAC operating at a sample rate of 50 GS/s in 14nm CMOS technology.
I. INTRODUCTION
Conventional radio-frequency (RF) front-ends are typically composed of several mixers, local oscillators and analog filters. These components are a sizeable expense in terms of cost, area, and power, especially when implemented in phased array systems with several radiating antenna elements [1] . Fortunately, integrated circuit technology has advanced to such a degree that conventional RF front-end solutions are being replaced with high speed ADCs, DACs and digital signal processing (DSP) which perform frequency conversion and filtering operations in the digital domain [2] . This allows data converters to be placed closer to the antenna, thereby significantly reducing system cost and power consumption. In addition, high speed converters have their thermal and quantization noise power spread across a wide Nyquist zone, which enhances dynamic range after processing gain. In order for data converters to achieve multi-GS/s rates, it is common to time-interleave several low speed converters [3] , [4] . The high speed of the TIDAC coupled with the area efficiency inherent in 14nm CMOS presents an ideal use case for phased array systems such as next generation radar and 5G. However, the inevitable timing errors and mismatch among the low speed converter slices results in images, or spectral replicas, which corrupt the converter output spectrum. Therefore, image calibration schemes are often necessary in order to avoid considerable loss of dynamic range.
The authors in [4] consider a 20 GS/s 6-bit DAC with no calibration scheme in place. As a result, the spuriousfree dynamic range (SFDR) is limited to 40 dB at output frequencies near 9 GHz. The authors in [5] consider a twofold delta sigma TIDAC operating at an aggregate sample rate of 10 GS/s. The clock duty cycle error is understood to be the limiting impairment regarding dynamic range, and calibration schemes are proposed. However, the recommended solution involves digital pre-filtering, which is essentially equivalent to increasing the DAC resolution and tightening matching requirements. Although an analog post-correction scheme is proposed, an accurate measurement of clock duty cycle is required, and this proves to be increasingly challenging at higher sample rates.
In [6] , the issue of the interleave image is recognized as a limiting factor in high speed TIDAC performance. A selfcalibration circuit is proposed, but it is only functional for sample rates below 200 MS/s. Calibration schemes above this rate are left as an opportunity for future research.
The authors in [7] provide a duty-cycle calibration algorithm for a twofold TIDAC, but assume that the sub-DAC slices are balanced in terms of gain. Practically, this is not a valid assumption for an RF DAC in deep sub-micron processes. In fact, even minor mismatch in sub-DAC gain can exacerbate the interleave image, leading to major loss of dynamic range. This is shown in Section II.
In this paper, we consider a 10-bit twofold TIDAC with current steering architecture operating at an aggregate rate of 50 GS/s using two 25 GS/s sub-DAC slices in 14nm CMOS technology. The DAC is part of a digital transceiver core from Jariet Technologies that contains an on-chip high speed ADC, microcontroller, and digital control via an SPI interface. For the DAC under consideration, there is an image which appears at half of the aggregate sample rate. As far as we know, calibration schemes for DACs at sample rates this high have not been reported. As shown in Section II, the impairments which exacerbate this image are clock duty cycle error, mismatch in sub-DAC analog gain, and clock and data misalignment.
We use the closed-loop configuration shown in Fig. 1 to design an algorithm which suppresses the interleave image to the noise floor. This ensures that dynamic range does not suffer due to interleaving effects. Note that although the authors in [7] use a similar configuration to Fig. 1 , the algorithm proposed herein does not assume the sub-DACs are balanced in terms of gain. In addition, the configuration in Fig. 1 does not rely on any bandwidth limited circuitry as in [6] , and does not tighten matching requirements as in [5] . In Section II, we provide some background information on twofold TIDACs. Using Fourier analysis, we explicitly show how specific impairments can cause an undesired image at half of the aggregate sample rate. In Section III, we concretely define the problem at hand in an integer programming framework, and a novel solution is proposed based on simulated annealing. In Section IV, we apply this solution to a 50 GS/s DAC in 14nm CMOS and provide experimental results which highlight its efficacy in terms of image suppression. We conclude in Section V by summarizing the key results and providing some direction for future research.
II. TWOFOLD TIME-INTERLEAVED DAC
The block diagram for the general M -bit TIDAC operating at a sample rate of f s is illustrated in Fig. 2 . A phase locked loop (PLL) generates a clock at frequency f s /2 which is distributed to the blocks denoted by serializer, sub-DAC A, sub-DAC B, and AMUX. The serializer contains a clock tree with several 2-to-1 multiplexers that serialize the N low speed parallel lanes into two high speed ones at the f s /2 rate. The sub-DAC slices employ current drivers for each bit to convert the M -bit code presented at the input to an analog output current. The drivers are composed of binary weighted current sources and clock driven switches. When the switches are active, current is driven to the output, and when they are inactive, current is dumped to a dummy node which is not shown in the diagram. This is controlled by the analog multiplexer (AMUX).
Ideally, in this ping-pong like configuration, each sub-DAC drives current to the output for 50% of the half-rate clock period. However, this is generally not the case due to unavoidable clock duty cycle error. In Fig. 2 , we include a fractional timing offset factor α ∈ [−1, 1] in order to account for this. Note that α = 0 corresponds to the ideal case of 50% duty cycle. In this section, we show that this impairment causes an image in the frequency domain which is located at f s /2. Also shown in Fig. 2 are the sub-DAC analog gains, g A and g B . Note that in general, g A = g B mainly due to current source imbalance between the sub-DACs, and this also causes an image at f s /2. We refer to the architecture illustrated in Fig. 2 as a current steering twofold TIDAC. We proceed by computing Y (f ), which is the Fourier transform of the DAC output y(t). Throughout the paper, we denote the Fourier transform of a time-domain signal y(t) by
Note that
so we can compute the Fourier transform of the individual sub-DACs and then simply add the result to obtain Y (f ) by linearity of the Fourier transform. Without loss of generality, assume that sub-DAC A is driving current to the output at time t = 0. Note that y A (t) can be modeled as a sum of phase shifted return-to-zero (RZ) pulses whose amplitude is determined by the discrete-time sequence x A (n) = x(2nT s ), where x(t) is the continuous-time representation of the input.
In particular, we have
where
δ(t) is the Dirac delta function, and * denotes the convolution operator. It is clear that (3) is a sum of phase shifted RZ pulses, as it is the convolution of a rectangular function with an impulse train. Taking the Fourier transform of (3), we have
where sinc(x) := sin(πx)/(πx), and we use the fact that convolution in the time domain becomes multiplication in the frequency domain and vice-versa. The Fourier transform of y B (t) is obtained similarly, and is given by
Note the additional complex exponential factor in the sum of (6) compared to (5) due to the assumption that sub-DAC A is aligned at t = 0. Using (2), the Fourier transform of the DAC output y(t) is
where Y A (f ) and Y B (f ) are given by (5) and (6) respectively. Note that if α = 0, the complex exponential in the sum of (6) is -1 for k odd and will cancel the the corresponding term in (5) if and only if g A = g B . As mentioned in the introduction, clock and data misalignment also exacerbates the f s /2 image. When sub-DAC A undergoes a data transition, there is a settling window of τ settle as shown in Fig. 3 . During this time, sub-DAC A is dumping current to the dummy node while sub-DAC B is driving current to the output. The ideal scenario corresponds to the case where the clock edges are equidistant from the data transitions as illustrated in Fig. 3 . In any other scenario, one sub-DAC has a longer (or shorter) τ drive than the other. It is this timing imbalance which exacerbates the image at f s /2 in a manner similar to that of clock duty cycle error. For the chip under consideration in Section IV, there is an algorithm that performs coarse clock and data alignment, but that is beyond the scope of this paper.
Consider a twofold TIDAC for the case in which the ideal output is a sinusoid at frequency f out . From inspection of (7), there is an interleave spur which appears at f s /2 − f out . The contour plots in Fig. 4 illustrate the -50 dBc level curves of the interleave spur magnitude for various values of f out . These are obtained using (7) . If the gain and duty cycle errors are contained within these contours on the lower left region of Fig. 4 , then we guarantee the image spur is less than -50 dBc, which is reasonable from an SFDR perspective for a wideband RF DAC. From Fig. 4 it is clear that extremely small gain and duty cycle errors are required for reasonable DAC SFDR performance.
In Section III, we propose a machine learning based algorithm which uses digital control to suppress the interleave spur to the noise floor. Fig. 4 : -50 dBc level curves of interleave image magnitude when the ideal output is a sinusoid at frequency f out .
III. SIMULATED ANNEALING ALGORITHM
As mentioned in the introduction, the DAC under consideration is part of a digital transceiver core that contains a high speed ADC and digital control via a microcontroller and SPI interface. There are several controls which remedy the impairments discussed in Section II. Table I outlines these controls along with their corresponding objectives. Note that the chip under consideration in Section IV has these controls split into six different control registers, each of which has a wide range of discrete settings. Therefore, we begin by defining a state vector s ∈ S ⊂ R 6 whose entries are composed of the digital control settings. In order to find the optimal control settings, we require the ability to measure the interleave spur power. Consider a TIDAC with sample rate f s and sinusoidal output with frequency f out . Again, by inspection of (7), we observe that an interleave spur appears at f s /2−f out . Using the on-chip ADC, we then sample the DAC output, compute the fast Fourier Transform (FFT), and monitor the bin corresponding to f s /2 − f out . The energy in this FFT bin then defines a cost function C : S → R. The objective is to then choose a vector s * ∈ S such that
The objective defined by (8) is an integer programming problem. There are a couple of key items worth mentioning. First, note that we do not have an expression for the cost function C(s), so optimization via relaxation and differentiation is not an option. In addition, the solution space is large, as the state vector lies in six-dimensional space and each entry has a wide range of discrete values. A suitable algorithm which promotes global optimum convergence in this scenario is known as simulated annealing [8] . The pseudocode for simulated annealing is outlined in Algorithm 1. Algorithm 1 has a temperature parameter T which starts high at T max and gradually reduces to T min exponentially with factor γ. At each value of T , we perform K iterations which involve a cost comparison of the current state s with a neighboring state s = n(s). Note that states s whose cost is less than or equal to the current state s are always accepted (i.e. ΔE ≤ 0). If a neighbor is accepted under the criteria ΔE ≤ 0, then we check whether or not it has a lower cost than the optimal state s * . However, states with higher cost (i.e. ΔE > 0) are not necessarily rejected. In fact, the acceptance of higher cost states is controlled by the temperature T in a probabilistic manner. Note that the term exp −β ΔE T → 1 as T → ∞ where β > 0 is a hyperparameter. This implies that the state space is explored aggressively when T is large since the acceptance of higher cost states becomes more probable.
A key component of Algorithm 1 involves constructing the neighboring state function n(s). In our case, this process first involves choosing a number from the discrete uniform distribution U {1, 6} which corresponds to one of the six digital controls. We then choose another number uniformly at random over a range which covers the selected control setting.
The neighbor state is found by simply substituting the new control setting into a copy of the previous state.
IV. EXPERIMENTAL RESULTS
In this section, we use the Agilent N9030A spectrum analyzer to apply Algorithm 1 to a 10-bit twofold 50 GS/s TIDAC in 14nm CMOS. Note that the spectrum analyzer samples the DAC output which effectively emulates the onchip ADC. The plot in Fig. 5 demonstrates the efficacy of Algorithm 1 over Nyquist and compares it to a simple grid search over the state space. Note that Algorithm 1 keeps the interleave spur well below -50 dBc. After starting Algorithm 1 with control registers in their initial states, convergence occurs after an average of 160 interleave spur measurements, and grid search was performed with 280 measurements. Note that at high frequency, simulated annealing has a 15 dB improvement over grid search while requiring nearly half as many measurements. The parameters used as input to Algorithm 1 were γ = 0.8, K = 30, and β = 50. These experiments were conducted using an Altera FPGA which serves as a bridge between the PC and the SPI interface. The test board and chip are shown in Fig. 6 .
V. CONCLUSION
In this paper, a novel image calibration algorithm for a twofold TIDAC is proposed and verified in silicon on a 10bit 50 GS/s DAC in 14nm CMOS. The algorithm does not exacerbate matching requirements as in [5] , and does not assume the sub-DAC gains are balanced as in [7] . Furthermore, bandwidth limited calibration circuitry is not required as in [6] . Although an on-chip high speed ADC is assumed, this is becoming much more practical with the use of low power deep sub-micron processes like 14nm CMOS. Future work involves repeating the measurements in Section IV using the on-chip DAC to ADC loopback path. Beyond interleave impairments, high speed data converters have harmonic distortion. Using machine learning for harmonic suppression would be another interesting and fruitful research opportunity. 
