Abstract-A new Vernier time-to-digital converter (TDC) architecture using a delay line and a chain of delay latches is proposed. The delay latches replace the functionality of one delay chain and the sample register commonly found in Vernier converters, hereby enabling power and hardware efficiency improvements. The delay latches can be implemented using either standard or full custom cells, allowing the architecture to be implemented in field-programmable gate arrays, digital synthesized applicationspecific integrated circuits, or in full custom design flows. To demonstrate the proposed concept, a 7-bit Vernier TDC has been implemented in a standard 65-nm CMOS process with an active core size of 33 μm × 120 μm. The time resolution is 5.7 ps with a power consumption of 1.75 mW measured at a conversion rate of 100 MS/s.
I. INTRODUCTION

I
N RECENT years, time-domain signal processing has become a promising alternative to signal processing implemented in the voltage or current domains. The reason is that the intrinsic gain of the CMOS transistors, which is transconductance over channel conductance g m /g ds , is reduced for each new CMOS process node [1] . At the same time the cutoff frequency, i.e., f t , for the same CMOS transistors increase. As a result of this, the resolution in the voltage domain decreases, whereas the resolution in the time domain increases.
Time-to-digital converters (TDCs) are, for example, used in analog-to-digital converters (ADCs) [2] , [3] and in digital phase-locked loops (PLLs) as a replacement for the phase comparator [4] . By replacing the phase comparator with a TDC, the charge pump and the analog loop filters can be replaced with digital filters and a digital control loop. In the PLL case, the inputs to the TDC are two clock signals, which is the notation we will use in this brief.
In this brief, we propose a new Vernier TDC architecture enabling both power and area improvements. These savings are made possible by replacing the second delay chain and the sampling register commonly found in Vernier converters with a chain of delay latches. Manuscript The proposed architecture can be implemented using either digital standard cells or full custom cells. Although the proposed architecture can be implemented as a single delay-line TDC we will use the delay lines in a Vernier configuration in this brief. The Vernier configuration is used since a single delay-line TDC is limited by the gate delay, whereas the Vernier TDC can achieve sub-gate-delay resolution [5] , [6] . All transistors in the design are used as digital switches; hence, the proposed architecture suits for implementation in CMOS processes with reduced feature size.
A hardware-efficient reset and edge detect circuit is proposed. The circuit generates a reset before each conversion cycle and also filters out the correct edges of the input signals. The circuit can also handle input signals with different signal frequencies.
The brief is organized as follows. Section II describes the delay latch chain TDC architecture and the reset and edge detect circuit. Section III presents measurement results and comparisons with recently published TDCs. The brief is concluded in Section IV.
II. PROPOSED TDC ARCHITECTURE
A TDC converts the time difference between two input signals to a digital output word. In a single delay-line TDC the time resolution is limited by the gate delay in the delay chain, whereas a Vernier TDC can achieve sub-gate-delay resolution [5] , [6] . A Vernier TDC compares the delays of two delay lines by sampling the state of one delay line with a signal that has propagated through a second delay line with a shorter unit delay, as illustrated in Fig. 1 . Assuming that the unit delays of the start and stop delay lines are τ 1 and τ 2 , respectively, the resolution of a Vernier TDC is given by the delay difference
The proposed Vernier architecture is described in Section II-A, and the reset and edge filtering circuit is described in Section II-B.
A. Delay Latch Chain
The proposed TDC architecture consisting of a chain of delay latches with unit delays τ 1 and a delay line with unit delays 1549-7747 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. τ 2 is illustrated in Fig. 2 . The delay latches are transparent if the control input is low, and they hold their output values if the control input is high. The delay latches are modeled using buffers and multiplexers with zero delay connected in feedback.
A complete conversion cycle for the proposed architecture in Fig. 2 consists of the following steps, where it is assumed that
1) The TDC is prepared for conversion in the reset phase, where the start and stop inputs are low. All delay latches are now transparent. 2) At the next rising edge of the start input, a pulse propagates through the delay latch chain gradually increasing the thermometer code at the t x outputs. 3) At the next rising edge of the stop input, a second pulse propagates through the delay line continuously setting the delay latches in hold state. 4) When the stop pulse catches up with the start pulse, the N th delay latch is nontransparent, hereby stopping the propagation of the start pulse. 5) The thermometer code, i.e., t x , at the output of the delay latches is now linearly dependent on the time difference, i.e., ΔT , between the two inputs. The delay latches in the proposed architecture can be implemented in a variety of ways using either standard cells or a full custom solution. A hardware-efficient circuit is illustrated in Fig. 3 , where the delay latch chain is implemented using dynamic inverters with alternating nMOS and pMOS enable transistors and works as follows.
When the gate voltage is set high on an nMOS enable transistor, the delay latch works as an inverting delay element and when the gate voltage is low, the output of the delay latch becomes a floating node, hence holding the current voltage value. The pMOS enable transistors work in the same way as the nMOS transistors but with complementary gate voltages.
Since the delay latch outputs, i.e., t x , are floating when the enable transistors are turned off, that is, no path exists to supply nor ground, pull-up/down circuitry are added, as illustrated in Fig. 3 . The pull-up/down circuitry have two additional purposes, that is, acting as an extra load to ensure that τ 1 > τ 2 and also act as buffers driving the inputs of the thermometer-tobinary encoder.
Matching transistors are added to the delay-line inverters to match process, voltage, and temperature variations. The matching transistors are always enabled by connecting the nMOS and pMOS transistor gates to supply and ground potentials, respectively. Note that all delay latches and delay elements are inverting in the detailed implementation; hence, every second thermometer code bit is also inverted. This can however easily be corrected for in the succeeding thermometer-to-binary encoder.
Each delay stage requires nine transistors, including the pullup/down circuitry. This can be compared with the standard Vernier TDC architecture in Fig. 1 that requires 28 transistors per delay stage in an implementation assuming that one D flipflop uses 24 transistors. Hence, the proposed solution reduces the transistor count by 68%.
Monte Carlo simulations have been performed on an extracted layout made in a 65-nm CMOS process to predict how the delay difference τ 1 − τ 2 is affected by the process variations. The supply voltage is 1.2 V, and the temperature is 70
• C. From the histogram in Fig. 4 , it can be concluded that the TDC has an expected time resolution of 5.4 ps with a variance of 1.0 ps due to process variations and transistor mismatch. If a smaller variance is required, calibration can be applied as, for example, suggested in [7] . Fig. 5(a) -(c) shows simulated integrated nonlinearity (INL) results for three process corners, that is, the typical, the fast nMOS/slow pMOS, and the slow nMOS/fast pMOS corners. The differential nonlinearity (DNL) for the same corners are shown in Fig. 6(a)-(c) . The simulations show that the linearity of the TDC is stable over process corners, but there is a spread in time resolution, as shown in Fig. 4 . In Fig. 5(a) -(c), we find a large drop in INL for lower end codes. This is caused by an insufficiently sized inverter [INV1 in Fig. 7(b) ] as is further discussed in Section III-A. A simulation with a correctly sized inverter is shown in Fig. 5(d) .
B. Reset and Edge Detection Circuit
The TDC requires a reset before each conversion cycle and should also measure the time difference between the rising edge of clkA and the next rising edge of clkB, as shown in the timing diagram in Fig. 7(c) .
A high-level description of a circuit generating a reset before each conversion and also performing the edge detection is illustrated in Fig. 7(a) , where clkA and clkB are the inputs to the circuit, and the start and stop signals are the inputs to the succeeding Vernier TDC.
An efficient implementation of the circuit in Fig. 7 (a) is shown in Fig. 7(b) . The circuit uses less hardware than the D flip-flop implementation in Fig. 7(a) , which makes it easier to maintain a constant delay between the clkA and clkB inputs.
The circuit in Fig. 7(b) works as follows. When clkA is low, both delay lines are reset by discharging the start and stop nodes. The en_start is now charged, allowing a pulse to ripple through the delay latch chain at the next rising edge of the start node. At the rising edge of clkA, the start node is charged high, and the delay latch chain starts to ripple. At the same time, the nstop node is discharged through transistors M2 and M3, thus charging the stop input. However, since the en_stop node is still low, the stop delay line will not start to ripple until the next rising edge of clkB. 
C. Thermometer-to-Binary Encoder
The thermometer-to-binary encoder is a crucial building block since it accounts for approximately 60% of the total dynamic power consumption of this power efficient TDC. A power split for the implemented TDC is given in Section III-B. To minimize the power consumption, an encoder based on multiplexers was chosen. Previous investigations show that such an encoder [8] , [9] requires less hardware, has a shorter critical path and also lower power consumption, as compared with commonly used one's counter solutions.
III. MEASUREMENTS
The TDC was implemented in a standard 65-nm CMOS process with a core size of 33 μm × 120 μm. A chip photo is shown in Fig. 8 , and the measurement results from four chip samples are presented in the following.
A. Time Resolution and Single-Shot Precision
The time resolution was measured using a Rohde Schwarz SMBV100A vector signal generator, where the I and Q outputs from the RF baseband generator were used as inputs to the TDC. The time difference between the input signals was swept in 1-ps steps, and 10-K samples were collected for each of the settings. The average of these samples was derived, and the resulting INL and DNL curves are shown in Fig. 9 . The DNL and INL curves are normalized to the average time resolution of the TDC, which was measured to 5.7 ps.
From the DNL curve in Fig. 9(a) , it is shown that the TDC is monotonic. Monotonicity is important if the converter is used in closed-loop applications such as, for example, digital PLLs [5] .
In Fig. 9 (b) and (c), the INL was derived using the best fit line and the end-to-endpoint definitions, respectively. In Fig. 9(b) , we find the INL to be −5 LSBs for lower end codes. This comparatively high nonlinearity is caused by an insufficiently sized inverter, i.e., INV1, in Fig. 7(b) . The relatively long rise time of the inverter unfortunately sets the latch (i.e., the path through transistors M1, M2, and M3) in a metastable state for low input codes, which is when the rising edges of clkA and clkB, are close in time. The metastability increases the delay through the latch resulting in the nonlinear INL curve. Careful simulations verify this hypothesis and an INL simulation with a correctly sized inverter is shown in Fig. 5(d) . For codes higher than 6, INL is within ±2.5 LSBs. If the end-to-endpoint INL definition is used the worst case INL increases to 9 LSBs, as shown in Fig. 9(c) .
The single-shot precision measures the output of the converter for a constant input signal. This measurement catches noise and other nonideal behavior from on-chip, as well as offchip sources. The time difference of the input signals was swept in 1-ps steps, and 10-K samples were sampled for each setting. The standard deviation σ was derived for each input code and is plotted in Fig. 10 . The variation in standard deviation probably originates from a nonuniform layout of the TDC. Histograms for three selected input codes are shown in Fig. 11 .
B. Power Consumption and Conversion Rate
The maximal conversion rate was measured to 100 MS/s. The total power consumption at this conversion rate was 1.75 mW of which 20% are consumed in the Vernier chain, 60% in the thermometer-to-binary encoder, 10% in the digital support block, and 10% in the output buffers driving the digital Fig. 12 .
C. Comparison With Recently Published TDCs
In Table I , the implemented TDC is compared with recently published TDCs with a resolution in the range 4-6 ps. The TDCs in Table I are selected with respect to small area and low power consumption. Note that there are converters with subpicosecond resolution [10] , [11] . The finer time resolution does however come with a significantly larger chip area and power consumption.
From Table I , it can be concluded that the proposed TDC offers competitive performance in terms of area and power consumption. The delay-line TDC has shorter conversion range than a looped architecture [12] . Intended application areas for the proposed TDC are counter-assisted digital PLLs [5] and all-digital ADCs [2] , [13] .
The limited measured nonlinearity will be addressed in future designs by mainly resizing the inverter in the edge detect circuit, as described in Section III-A. Note that the prototype chip still shows a high potential of the proposed architecture.
IV. CONCLUSION
A new Vernier TDC architecture using a delay line and a chain of delay latches has been presented. It has been demonstrated that a full custom implementation of the proposed architecture reduces the transistor count with 68%, as compared with a conventional solution leading to both power and area savings. A 7-bit Vernier TDC has been implemented in a standard 65-nm CMOS process with an active core size of 33 μm × 120 μm. The time resolution was measured to 5.7 ps with a power consumption of 1.75 mW at a conversion rate of 100 MS/s.
