Abstract-This paper presents a 3.5 GS/s 6-bit current-steering digital-to-analog converter (DAC) with auxiliary circuitry to assist testing in a 1 V digital 28-nm CMOS process. The DAC uses only thin-oxide transistors and occupies 0.035 mm 2 , making it suitable to embedding in VLSI systems, e.g., field-programmable gate array (FPGA). To cope with the IC process variability, a unit element approach is generally employed. The three most significant bit (MSBs) are implemented as seven unary D/A cells and the three least significant bits (LSBs) as three binary D/A cells, using appropriately reduced number of unit elements. Furthermore, all digital gates only make use of two basic unit blocks: a buffer and a multiplexer. For testing, a memory block of 5 kb is placed on-chip, which is externally loaded in a serial way but internally read in an 8× time-interleaved way. The memory is organized around 48 clocked 104-bit shift-registers. It keeps the resulting switching disturbances signal-independent and hence avoids inducing output nonlinearity errors, even when a common power supply is shared with the DAC. This novelty allows reliable testing of the DAC core, while avoiding performance limitation risks of handling high-speed off-chip data streams. The DAC Spurious Free Dyanmic Range >40 dB bandwidth is 0.8 GHz, while the IM3 <−40 dB bandwidth exceeds 1.3 GHz. The DAC consumes 53 mW of power and the design-for-test scheme -80 mW.
I. INTRODUCTION

E
MBEDDING data converters in modern VLSI systemon-chip (SoC), e.g., field-programmable gate arrays (FPGAs), allows the digital signal processing to directly interact with the analog world in a very efficient way. The digital data transfers are kept on-chip, which is particularly power-efficient for sampling rates exceeding giga-samplesper-second (GS/s), where input-output buffer (BUF) power starts to dominate. Low-resolution data converters with a few gigahertz-range signal bandwidths and |IM3| > 35-40 [dB] are needed for emerging applications, such as WPAN and WIHD, LMDS p2p, and UWB [1] - [7] . However, embedded data converters need to accommodate a range of different design requirements from their conventional stand-alone counterpart solutions. These include small silicon footprint, very high reliability, and implementation on a common digital VLSI CMOS SoC with limited available voltage head-room and increased process variability. For embedded data converters in VLSI SoCs, reliability requirements are set very high, since a single functional failure can mean failure of the whole chip. To control output quality, special emphasis is put on testing. The price of test is calculated in the total production costs. Therefore, embedded test blocks, which do not deteriorate the analog intrinsic performance via switching delta-I noise, can greatly reduce the final product costs, through, e.g., wafer sorting before packaging and self-testing of redundant on-chip structures.
This paper presents both a 28-nm CMOS 1 V 3.5 GS/s 6-bit current-steering DAC and a novel digital front-end design-for-test (DfT) block, as conceptually shown in Fig. 1 . The 6-bits digital-to-analog converter (DAC) uses 1 V headroom design with a general unit-element approach. The chosen segmentation level divides the input 6-bits into three MSB unary and three LSB binary sections. All digital gates, i.e., latches, three-to-seven binary-to-unary decoder, and data drivers, are based on only two fundamental blocks-the BUF and the multiplexer (MUX). Such a unit-element approach may reduce the efficiency but it effectively counters the process variability and guarantees speed. All digital gates are current-mode logic (CML) [8] - [10] , except for the memory which is CMOS but features signal-independent switching. The proposed architecture of the digital DfT block decouples the generated digital CMOS disturbance from the signal. Thus, it avoids inducing distortion errors into the DAC core via, e.g., the shared power supply and substrate. Instead of conventional RAM, which produces signal-dependent switching, a shift-register loop is proposed. The large amount of constant switching activity is reduced through employing an 8× time-interleaved (TI) scheme, which also guarantees the speed. The DAC performance shows Spurious Free Dyanmic Range (SFDR) >40 dB bandwidth up to input frequencies f sig of 800 MHz, while IM3 <−40 dB bandwidth is maintained up to 1300 MHz.
The major contributions of this paper include: one of the first 28-nm DAC implementations and the first high-speed 1 V only design (in Section II); the use of shift registers as a core memory block in the test-assisting structures (in Section III); highest measured SFDR performance up to 800 MHz among the published 6-bit CMOS DACs (in Section IV); scalable test-assistance block; detailed description of an on-chip test assistance apparatus; 8:1 high-speed singlegate CML MUX implementation for TI-data reconstruction; and high-speed CML gate design approach based on only two-gate types. Generally, Section II discusses the DAC design and strategies with advance CMOS processes. Section III discusses the novel test-assisting blocks. Section IV presents the measurement results. Finally, conclusion is drawn.
II. 3.5 GS/S 6-bit CURRENT-STEERING DAC IN 28-nm CMOS
Embedded data converters require implementation in the CMOS technology process chosen for the digital platform. In the case of the most recent digitally advantageous IC processes, the data converters may require operation with a single 1 V power supply. Conventional DAC design approaches are not directly applicable with such low-voltage headroom. This section reviews the traditional DAC design approaches and proposes strategies for designing in advance, e.g., 28 nm, CMOS processes with reduced power supply.
A. Output D/A Cell Architecture
For high-speed DAC design, the current-steering architecture is the preferred choice mainly due to the excellent settling of its analog output for a flash-given digital code, [1] - [7] , [11] - [16] . The digital part is usually implemented with CMOS-like logic [1] - [5] mainly due to its power efficiency for generating sharp transition edges. The consumed power is dynamic since it is used only when needed. The main error sources are signal-related disturbances along the power supply rails and the substrate, imperfect differential signaling, and glitches at the common source node of the D/A switches (shown in Fig. 2 ). To solve these problems, several solutions are available. Extensive power supply decoupling, with an emphasis on local decoupling, usually reduces disturbances on the power rails. Sharp transition edges ensure sufficient quality of differential signaling. Finally, increasing the output impedance of the current sources can mitigate the effects of the signal-dependent glitches at the common-source node of the D/A switches. However, these solutions cannot be optimally implemented without availing of thick-oxide transistors in low-voltage CMOS processes. Thick-oxide transistors allow increased voltage head-room, see [12] , when supplies greater than the SoC core voltages are available. However, if 1-V supply is all that is available, then using transistor cascodes becomes a subject of compromise. Finally, the allowed silicon area is limited and hence cannot accommodate extensive power supply decoupling or different power domains.
The design choice here is to avoid cascode transistors, so allowing more drain-source voltage headroom for the current source transistors, such as in [11] and [12] . The matching is hence improved, since high overdrive voltage can be used. This is further combined with CML for the digital circuits [6] , [11] . CML does not require massive power-supply decoupling, since it draws continuous current from the supply. CML provides natural differential signaling derived from its differential pair operation [8] - [10] . Digital signal levels can be designed in such a way that the glitches at the common source node of the D/A switches are minimized by properly choosing the signal crossing point of the data drivers. Fig. 2 compares CMOS and CML-based architectures of D/A cells. CMOS logic gates consume their power exclu-sively in the signal transition, while CML gates consume power continuously. CMOS allows faster data transition edges than CML for given power consumption and capacitive load. Note that the threshold mismatch of the switch transistors M sw is transferred to timing mismatch errors via the slope of the switching signals. Thus, sharp edges are required for small timing errors among the D/A cells [13] . This advantage of the CMOS drivers reduces when the capacitive self-loading effects dominate, e.g., in the case of reduced power supply. Furthermore, a disadvantage of CMOS data switching is the delta-I noise [17] , and large rail-to-rail signal swing. These disturb the analog behavior of the circuits. For example, a large voltage glitch at the common source node of the D/A switches M sw,n and M sw, p is created. This is because of the large-signal operation of the switches: M sw,n starts switching off immediately after the initial DATA switch-off transition, while M sw, p only starts switching on when the DATA+ signal reaches a voltage level that is sufficiently above the common source node to provide the required V gs . Such a glitch disturbs the DAC current but its effects can be reduced by using the cascode transistor M cas . However, M cas consumes voltage-headroom, which in the case of 1-V design can become critical. In our approach, the common source node glitch can be reduced by reducing the voltage swing of the data signal through CML signaling, as shown in Fig. 2(b) . Thus, an advantage of the CML-data signaling is in its easy regulation, and hence the glitch can be minimized. Note that CMOS gates can be made to switch with reduced voltage swings but for increased design complexity and slowed-down transition edges. Other advantages of the CML-based DAC architectures include: natural differential signaling, small disturbances on the power supply rails (less delta-I noise [17] ), small capacitive selfloading of the data tracks, better power-supply noise rejection, temperature, and process variation insensitivity [6] - [11] . In the case of 1 V available power supply, CML implementation is preferred, since the required strength of a CMOS driver needs large transistors which increase both the driving requirements for the preceding stages (the synchronization latches) and the capacitive self-loading effects on the data tracks. Still however, a disadvantage of the CML solution, compared with CMOS, is the reduced slope of the data transition edges for a given power consumption, especially the transition high-tolow, which discharges the data track nodes through two nMOS transistors. This may lead to higher timing errors and hence deterioration of the DAC linearity at high f sig .
B. DAC Segmentation Architecture
To mitigate the drawbacks of the slow switching slopes of the data drivers, this paper attempts to match the transition edges for the MSB DAC bits and to relax their negative impact on the performance through unary segmentation. The common 50%-level segmentation is applied as a tradeoff between linearity and area [14] . Thus, the main dynamic linearity concerns are already addressed at system level by introducing segmentation of three MSB unary and three LSB binary sections. The three MSBs are implemented as seven nominally identical unary current cells and the three LSBs are binary, i.e., 10 D/A chains in total. Note that the area increase in comparison with two MSB unary segmented bits is minimal, since in total seven D/A chains would be needed. However, if four MSBs are unary segmented, then the area increase becomes significant, as 17 D/A chains in total would be needed. Finally, for more than 3-bits of unary segmentation, the complexity of the binary-to-unary decoder increases which may become critical in terms of power consumption at high sampling rates. A more in-depth discussion about the segmentation strategies can be found in [13] . Thus, three MSB of unary segmentation is considered as a good tradeoff between area, decoder complexity and MSB data transition matching. Fig. 3 shows the output stage of the 6-bit DAC with seven unary cells and three binary cells. The switch transistors M sw use 128 parallel units, each with two folded fingers of width 225 nm and minimum transistor length L. Note the high number of units, which are dictated by the required signal current and the reduced current density per transistor (max W per unit). Small W per finger also reduces the resistivity of the poly gate structure while the length L is kept minimum. In layout, the units are spatially averaged to attain good matching of the differential transistors. The switch transistors, M sw , are chosen to be the lowest available V th devices, to maximize their overdrive voltage, V gs -V th , and hence minimize parasitic capacitances and mismatch. The unary current source transistor (standard V th ) uses 128 units with an equivalent W/L of 1.8/1.8 µm. Since the maximum allowable transistor width and length may be limited by the process to about 0.9 µm, a unit current source is implemented as two parallel structures of two stacked transistors. Such sizes with conventional layout techniques are expected to easily guarantee 6-bits static intrinsic linearity yield, according to [13] , [18] - [20] . Moreover, as no cascodes are used, more voltage headroom is allocated for the current source transistor, allowing a high overdrive voltage and hence reduced V th mismatch. The three LSB binary slices appropriately reduce down the number of units. Finally, the requirements of the output impedance for a 6-bits, see [7] , are easy to meet, even when no cascodes are used. In this case, more voltage headroom is allocated for the switches and these are small, switching small output capacitance.
C. Transistor Level Design
D. Digital CML Gate Design
Unit element approach is also used in the design of the CML digital circuits. All gates in the digital part of the DAC are based on two basic CML cells: the BUF and the MUX. Fig. 4 shows their designs. The nominal tail current is 0.15 mA but the power per gate is adjustable via V bn1 and V bp1 . All the logic functions are derived from these two fundamental blocks. Such a unit-element approach reduces design uncertainty in the most recent but not yet mature CMOS processes. Constructing more powerful drivers is easily implemented via connecting standard BUF cells in parallel. For example, Fig. 5 shows the implementation of the data drivers. To match the driving strength to the load, the binary drivers are scaled down proportionally. That is to say, the binary data drivers use fewer unit BUF gates than the unary data drivers. In this way, the data switching transitions of the unary and binary D/A cells can be matched. Note that the load seen from the data driver does not ideally scale down by a factor of two for the binary bits due to the RC parasitics in the physical wiring. This is the reason why the drivers for binary bits two and one may scale down by a ratio less than two in practice, which combined with careful layout ensures matched scaling for the binary bits. Thus, the DAC architecture firstly solves the linearity concerns at system level by adopting 50%-level segmentation and then further the responses of the three LSB binary cells are matched via unit element approach and matched layout of wiring.
III. EMBEDDED TEST ASSISTANCE
The proposed DAC targets embedded VLSI SoC applications. Thus, the input digital data stream will remain on-chip. To emulate the VLSI environment and facilitate characterization, a test-assistance block is designed. In general, DAC test assistance is greatly beneficial particularly at highdata rates as the large amounts of transferred data are kept on-chip. For the proposed DAC, 6-bits are transferred at 3.5 GS/s, which amounts to 21 Gb/s data transfer. To avoid unnecessary risks associated with the traditional high-speed data I/O and propagation of 3.5 GS/s signal on a PCB (e.g., disturbances, power-integrity problems, and data signals and substrate bouncing), the test-assistance is designed to provide this data on-chip. Finally, DfT is required in a VLSI SoC to guarantee fabrication quality at reduced price. Note that the price of test remains constant, while the price of silicon per function reduces with the continual development of the CMOS IC processes. Thus, on-chip test-assisting circuits are needed to reduce the cost of test while guarantying high-VLSI SoC yields. For such DfT applications, a single test-assistance block may be shared among multiple data converters. Its application can be extended to wafer sorting and FaB tests, again in a shared manner between all embedded DACs. This section introduces the DAC DfT digital block, discusses its problems due to signal-dependent delta-I noise, and presents its design which shares a common 1 V power supply with the DAC core.
A. DfT Assistance for DAC Characterization
In this paper, the DfT scheme is used for characterization, avoiding the risks of handling high-speed off-chip data. Indeed, DAC test assistance has been suggested in several recent publications [6] , and [11] . However, these have not discussed it in detail, even though it deserves attention as the measured DAC performance depends on the test-assisting circuits. The need for clean test-assisting circuits is further emphasized when a common power supply needs to be shared. The proposed DfT DAC architecture can easily share a common 1 V power supply with the DAC, because it is designed for minimal signal-dependent delta-I noise. It mainly uses CML digital gates, except for the memory, which must be CMOS implemented for practical reasons.
The main requirements for the DfT architecture are meeting the DAC speed specifications and keeping the generated disturbances below the DAC own error sources. The two-main operation modes for the DfT scheme are data upload and data read out. The data are uploaded at low speed through a serial interface and it is read at high speed in an infinite loop. The DAC is characterized with the help of an on-chip 5-kb memory that can be configured into either a single 5-kb shift register for data upload or 8 TI 6-bit deep and 104-bit long shift register ring loops for data read out. That is to say 832 words deep 6-bit memory. When read at 3.5 GS/s, a frequency step of about 4.2 MHz is possible for the test signals. For data upload these loops are reorganized into a single long shift register, first-input, first-output (FIFO) type, by breaking the loops at a given tap and creating a link to the adjacent ring, as shown in Fig. 6 . All registers share a common clock. The reconfiguration options are implemented by simple CMOS MUX gates.
B. Signal-Dependency of Delta-I Noise
For data read out, a shift register-based memory avoids signal-dependent switching noise (delta-I noise [15] ), while CMOS logic-based memories, e.g., SRAM, cannot avoid it. The conventional CMOS-based circuits generate signaldependent delta-I noise, especially when read in an infinite loop. These blocks switch a unique combination of digital cells for a given code or address. Without loss of generality, Fig. 7 shows the concentration of delta-I noise power in specific frequencies related to the data signal in a 6-bit CMOS logic-type memory that is 832 words deep and read in an infinite loop. A sweep is simulated through all possible signal frequencies, F in , which can be stored without discontinuities. Then, each bit transition 1-0 is associated with a unit switching power disturbance. The joint contributions of all 6-bit signals are simulated and analyzed in the frequency domain. The ratio between the sum of the powers of the nine strongest spurs [usually the first nine harmonics (P 1−9 )] and the sum of the powers of the rest of the frequency spectrum [i.e., the rest of the delta-I noise power (P all )] is plotted for all signal frequencies F in /F s . As F in /F s increases, the correlation of switching activity increases, too. Thus, the delta-I noise concentrate in specific frequencies, viz., the first several harmonic spurs. For example, P 1−9 /P all approaches 10-dB close to F in /F s = 0.5. These disturbances can propagate to the DAC and intermodulate with its own error sources, resulting in performance deterioration. For example, signaldependent disturbances of the power supply rails and bouncing of the substrate can induce nonlinear errors in the DAC, and hence harmonic distortion spurs in its analog output. This is indeed the main concern for the DfT memory, since the DAC performance can be particularly sensitive when the power supply domain is required to be shared. However, the delta-I noise remains constant and hence signal independent, when a shift-ring register is used.
Thus, the 5-kb registers are reorganized in 8× TI 6-bit deep and 104-bit long shift register ring loops to form the core of the DfT, as shown in Fig. 8 . The TI-scheme guarantees the speed and reduces the amount of switching noise. Thus, CML circuits create eight subclocks at a rate of 437.5 MS/s. The sub- clocks are uniformly spaced within one period of 285.7 ps. These signals clock eight memory slices, which produce TI-data that is converted to CML levels and reconstructed to form a 6-bits 3.5 GS/s stream. The sequence of the subclock signals is guaranteed by design. The circuits for the generation of the 8× TI subclock circuits are shown in Fig. 8 and the respective signals are shown in Fig. 9 . First, the Master clock is divided by eight. Then, the result is fed as input data in an eight-tap shift register clocked by the Master clock. The outputs of the shift register stages are the intrinsic subclocks (sub_clock_orig [1:8] from Fig. 9 ). For illustration purposes, the signals are shown as single-ended, but they are differential CML types to keep the generated disturbances low.
Thus, the power of the switching noise is made independent of the data to avoid inducing signal-dependent errors into the DAC. At every switching, the same switching profile occurs because all data words (addresses) are switched. In this way, the generated power-supply disturbance is the same for all codes. However, the switching activity is usually increased in comparison with the traditional RAM-based memory approaches. To reduce the switching activity, in general, TI memory organization can be applied.
C. TI DfT Scheme
The CMOS data memory is shown in Fig. 8 as eight ring slices of shift registers. The TI factor of eight is chosen as a balance between reducing the design requirements, optimizing the power consumption, and managing the complexity. Each of these slices is actually 6-bit deep and 104-taps long, to account for the resolution of the DAC and memory depth of 832 words. The data is actually read out by probing an arbitrary tap. There is no absolute address space. The memory addressing is relative. Thus, this is an easily stretchable solution, since any memory depth can be realized by just changing the length of the ring-shift registers. That is, why the clocking of the memory in the read out op-mode should always begin from slice one, since eight data streams are interleaved and these need to be aligned to each other. If clocking begins with another slice, the integrity of the uploaded data will be disrupted. Therefore, the subclocks are gated and enabled by a control signal ENABLE when the DfT scheme goes from data upload to data readout, as shown in Figs. 9 and 10 . The gating of the subclock guarantees that subclocks are aligned and slice one is first clocked.
Finally, eight TI 6-bit streams, at rate 437.5 MS/s, appear at the CML data reconstruction block. The reconstruction block is based on six MUXs eigth-to-one to form the 6-bits 3.5-GS/s data stream. The MUX circuit for a single bit is based on the basic MUX circuit with a dummy output and a shared load, as shown in Fig. 11 . Data [1:8] is the 1-b 8× TI data (shown as 6b_sub_data [1:8] signals in Fig. 8 before the conversion to differential signaling). S p [1:8] are reconstruction pulses derived from the eight TI subclocks, based for instance on a simple logical AND operation as shown in
(1) Fig. 12 . Layout, micrograph, and zoomed-in view of IR photo of the work.
IV. MEASUREMENTS
A. Test Chip Overview and Static Performance
The presented DAC and test-assisting circuits have been implemented in a 1 V digital 28-nm CMOS process and measured. The power consumption of the test-assisting circuits is about 80 mW. Fig. 12 shows the layout (top-left), the micrograph (top-right), and an infra-red (IR) zoom-in micrograph (bottom) of the fabricated chip. The current-steering DAC core is indicated with the middle (yellow) square on the (IR) micrograph. The occupied area of the DAC is about 0.035 mm 2 and the occupied area of the test-assisting circuits is about 0.048 mm 2 . The array of current sources is indicated in the bottom of the core (M cs ). It occupies about half of the core area. The 10 cells with the D/A switches (M sw ) occupy about 20% of the core area. The rest of the DAC core area is for the data buffers, synchronization latches, decoder and binary delays, clock network, and input latches. The exact corresponding measures are given in the floor plan of Fig. 1 . The DfT digital front-end scheme is indicated with the top rectangular (purple) shape. Its area is mostly dominated by the 5-kb memory. The input serial interface is indicated with one. The generation of the TI subclocks and control signals is indicated with two. The reconstruction circuits, positioned close to the input latches of the DAC core, are indicated with three. All empty areas are filled-in with decoupling capacitors. The power consumption of the DAC is 53 mW with an output signal current of 4.5 mA. The measured static accuracy of the 
B. Dynamic Performance
The measurement setup for the dynamic characterization is shown in Fig. 13 . Fig. 14 shows the measurement results for the SFDR against a sweep of input signal frequency f sig , compared with selected state-of-the-art works at similar sampling rates F s . The low-frequency performance is at 50 dB level, which is maintained up to about f sig = 180 MHz. Beyond these frequencies, the harmonic distortion rises and SFDR declines. An SFDR >40 dB is maintained up to about 800 MHz. The dominant spurs, limiting the SFDR, are usually HD2 and HD3. Fig. 15 (top) shows an exemplary spectrum of the whole 1.75-GHz Nyquist signal band for f sig = 610 MHz with SFDR = 46.6 dB. The SFDR limiting spur is HD2. Beyond this f sig , the folded HD3 spur becomes the dominant spur in the spectrum, mainly due to two factors: nonlinear output impedance distortion at the DAC output and RC low-pass filtering of the output measurement network. As f sig approaches the Nyquist band edge, the frequency of the folded HD3 further reduces. The output low-pass filtering attenuates f sig relative to HD3 and so misleadingly increases the level of the measured HD3 relative to f sig . Fig. 15 (bottom) shows the full Nyquist-band utilization of the DAC, demonstrated by the output spectrum for f sig = 1743 MHz with SFDR = 30.6 dB. However, HD3 = −38 dB is a better metric for the DAC linearity performance than HD2 = −30.6 dB in this case, since HD3 is closer to the fundamental signal than HD2, which is situated at very low frequencies. Such practical SFDR measurement limitations, due to, e.g., low-pass filtering effects of the DAC output and measurement networks, are not significant when considering IM3. For these measurements, the test signals are close to each other in the frequency domain and hence also close to the IM3 spurs. Therefore, the DAC output and measurement network attenuation is about the same. In addition, IM3 metric is also very important for the communication applications, since the IM3 spurs cannot be filtered out. Fig. 16 shows the measured IM3 plot for different F s and f center . The IM3 is measured with two tone signals, around a f center , that are within 10-MHz spaced from each other (depending on if the periods of both signals can fit in 832 words of memory). Fig. 17 shows an exemplary 40-MHz wide zoom-in spectrum around f center = 1.28 GHz for two tones input signal, while the sampling rate is F s = 3 GS/s. Table I compares this paper with selected state-of-the-art low resolution high-speed DACs. A classical CMOS logicbased design is found in [2] . Exceptionally, low-power consumption of about 8.3 mW is reported in [3] . The work of Le Tual et al. [6] has been selected for its 9-b resolution and integrated digital sine generator. The only example of selfcalibration at these sample rates can be found in [7] . Finally, the work of Greshishchev et al. [12] has been selected for its high F s = 56 GS/s and as a representative for the ultrahigh speed high-power class of DACs [11] - [16] .
C. Comparison With State-of-the Art
At these performance levels, the presented work uses the most advanced CMOS node and it is indeed the smallest design and the only one to operate with a voltage supply as low as 1 V. Furthermore, it demonstrates 50-dB SFDR bandwidth, stretching to 180 MHz. The closest to this performance level is the 9-b DAC of [6] , which shows almost flat SFDR performance of 48 dB up to 1500 MHz but benefiting from three more bits. These extra bits are generally expected to contribute to high-power consumption of the preceding digital signal processing and the DAC. The 56-GS/s design of [12] shows 9 GHz 40-dB SFDR bandwidth, which is about 11× larger than the one in this paper but this advantage is achieved at a price of 14× more power consumption, using both thin and thick-oxide transistors, and interfacing to 2.5 V voltage supply. Finally, this paper reports the largest on-chip testassisting memory of 5 kb, allowing a frequency step of the test signals of about 4.2 MHz. If shorter frequency step is required, the memory can easily be extended, since its architecture is stretchable and, as argued in this paper, does not induce signaldependent switching disturbances into the DAC.
V. CONCLUSION
High-speed linear current-steering DAC performance with reduced voltage head-rooms is feasible in modern CMOS processes using a CML-based unit element approach and embedded test-assistance. This paper presents a 3.5 GS/s 6-bit DAC with embedded signal-independently switching DfT scheme in 28-nm CMOS process. The DAC and the DfT scheme use a single 1-V power supply and only thinoxide transistors. The digital circuits predominantly use CML gates to provide a quiet environment for the DAC operation which helps to preserve its dynamic linearity. To match the responses of the D/A cells and hence maintain dynamic linearity, a segmentation of 50% is applied. To assist testing, 5-kb CMOS memory, in the form of a shift-ring-register loop, is implemented on-chip. The errors due to its switching disturbances are minimized by implementing an 8× TI scheme and decoupling the switching activity from the data. The measured DAC performance shows SFDR >40-dB bandwidth of 800 MHz, while the IM3 <−40-dB bandwidth exceeds 1300 MHz. The DAC power consumption is 53 mW, making it a good candidate for embedding in VLSI SoC systems, such as an FPGA.
