A time to charge converter IC with an analog memory unit (TCCAMU) has been designed and fabricated in HP's CMOS 1.2-µm n-well process. The TCCAMU is an event driven system designed for front end data acquisition in high energy physics experiments. The chip includes a time to charge converter, analog Level 1 and Level 2 associative memories for input pipelining and data filtering, and an A/D converter. The intervals measured and digitized range from 8-24 ns. Testing of the fabricated chip resulted in an LSB width of 107 ps, a typical differential nonlinearity of < 35 ps, and a typical integral nonlinearity of < 200 ps. The average power dissipation is 8.28 mW per channel. By counting the reference clock, a time resolution of 107 ps over ~ 1 s range could be realized. This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of the University of Pennsylvania's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to pubs-permissions@ieee.org. By choosing to view this document, you agree to all provisions of the copyright laws protecting it. 
Abstract-
to digital converter (TDC).
T IME to digital converters (TDC's) have a variety of industrial and research applications. They are used in time o f flight (TOF) measurement systems as well as test equipment for electronic circuit characterization. TOF systems include such examples as laser range finders, positron emission tomographs, and various high energy physics particle detectors. To satisfy the increasingly stringent requirements of data acquisition in colliding beam physics experiments, a VLSI chip called the TCCAMU (time to charge converter with an analog memory unit) has been designed.
In large scale particle detectors, the particle tracks resulting from the beam collisions must be reconstructed as precisely as possible. Straw tube drift chambers, for example, accomplish track reconstruction by measuring electron drift times. The electrons are created when the charged particles being detected ionize the gas in the straw tubes [I] . It is therefore necessary to measure the electron arrival times at each of the drift chamber's many thousands o f sense wires. A typical front end readout channel associated with a sense wire is shown in Fig. 1 .
The charge signal on a sense wire first must undergo amplification and shaping. I the digital vernier TDC can achieve 1 ps resolution using bipolar technology, it would not be practical in systems with an asynchronous single pulse input.
A third approach is based on an analog technique, such as that discussed by Stevens [S] . By integrating a current "I" on a capacitor during the time interval being measured, we have a time to voltage converter. The resulting voltage can then be digitized. Although this technique can be more complicated to implement than the stabilized delay line architecture, it does achieve sub-nanosecond resolution in addition to low power dissipation. This analog approach can be subdivided into two groups: those analog TDC's using a time-to-voltage converter (TVC) and those using a time-to-charge converter (TCC).
The TCC, which is part of a dual slope converter, is well suited for massively parallel data acquisition systems. It has the advantages over the TVC of insensitivity to capacitor and integration current variations, and a calibrationless gain. The TCCAMU chip discussed in Section I1 utilizes a CMOS TCC, giving it a very low power dissipation and a subnanosecond resolution.
TCCAMU ARCHITECTURE

A. over vie^, of TCCAMU
The TCCAMU measures time intervals, stores and filters the measured data in analog memory, and finally performs a digitization. The interval to be measured occurs between the leading edge of an asynchronous Discr pulse and an edge of the system clock. The system clock frequency for the TCCAMU is 62.5 MHz (16-11s period). By counting the clock cycles, it should be possible to accurately determine the arrival time of a Discr pulse over a wide dynamic range.
A time to charge conversion starts with a Discr pulse activating one of two width generators (Fig. 2) . The even and odd generators take turns accepting Discr inputs, resulting in time measurements with improved double pulse resolution. The selected width generator generates an output pulse whose width equals the interval being measured. For the duration of this output pulse, the TCCAMU steers a current "I" onto an integration capacitor in its analog memory. The resulting charge is proportional to the measured time interval.
The analog associative memory which stores and filters the charge data consists of Level 1 and Level 2 buffers. The readwrite locations in both buffers are chosen by the Level 1 ReadlWrite counters and the Level 2 ReadlWrite counters. The outputs of the counters are decoded by control logic to select the proper capacitors in the associative memory.
The signal from a time to charge conversion is stored initially in Level 1 for 1 ps. Because only a fraction of the events found by a particle detector will be useful, the system external to the chip will usually decide to discard the signal during that 1 ps. Signals that are kept are transferred to Level 2 for a longer term storage. Additional filtering in Level 2 decided by the external system occurs before the actual digitization. The Level 1 delay generator shown in Fig. 2 Note the absence of a Level 2 delay generator. The latency in the Level 2 buffer is not predetermined.
Any signal that passes through both levels of the analog memory will then be digitized by the on chip Wilkinson A/D converter. A current "11150" discharges the appropriate storage capacitor while the number of clock cycles are counted, resulting in an output digital word.
An important issue here is that the analog variable being stored and digitized is a charge, not a voltage. The TCCAMU gain therefore depends on the ratio of the charge and discharge currents, "I" and "I/150 respectively. This ratio can be made relatively process insensitive. Mismatches between channels in the integrating current "I" have little effect on the TDC gain matching. Mismatches in the integrating capacitors within and between channels also have little effect. Gain calibrations for the TCCAMU in a multichannel system would therefore be unnecessary.
For TDC's using a time-to-voltage converter (TVC), on the other hand, it is the capacitor voltage that is digitized. The gain would depend on the absolute value of the integrating current and the integrating capacitors. For multichannel systems, it can be cumbersome dealing with gain mismatches and gain drifts.
B. Width Generator
The time interval A t to be measured and digitized by the TCCAMU occurs between a Discr pulse's leading edge and a rising edge of the system clock. The width generator (Fig. 3) produces an output pulse of width At, leading to a charge IAt being placed on the appropriate storage capacitor.
The capacitors in the analog memory are reset to a reference level prior to the time to charge conversion (reset switches detects a Discr input pulse, it STARTS the integration as the current "I" is steered to the selected capacitor. The width generator waits for a falling clock edge, and then uses the rising clock edge after that to STOP the integration (see Fig. 4 (b)). This waiting feature gives us a pulse width range of 8 ns < At < 24 ns, thereby avoiding any nonlinearities due to zero integration times.
A simplified schematic for a width generator and the relevant waveforms are shown in Fig. 4 . Much of the circuitry in the generator is devoted to determining which rising clock edge will stop the integration. The Discr enters the one shot, which guarantees a minimum pulse width at the source of M I .
When the clock goes high, node A also goes high, causing the comparator output V, to switch high. When the clock then goes low, node B goes high. The next rising clock edge then generates the STOP signal, which resets the one-shot.
The important issue of metastability arises here when the falling clock edge occurs at the same time as the rising Discr edge. Transistor M1 will start turning off just as the output of the one-shot begins to raise M l ' s drain voltage. This case would produce an analog voltage at node A instead of a well defined logic level. The width generator could then become metastable, resulting in an arbitrarily wide pulse output. When the C L K goes low, however, M2 closes the positive feedback loop. This forces the comparator to resolve the analog voltage to a valid logic level usually within 112 clock cycle. A positive feedback loop therefore greatly reduces the range of Discr arrival times that cause metastable behavior.
In order to reduce the comparator's area and eliminate its dc power dissipation, the "comparator" was made to consist merely of 2 inverters in series. The feedback switch M2 connects the output IT, of the second inverter to the input A of the first. The voltage VhI shown in Fig. 4(a) is actually the metastable voltage for these 2 inverters when A = V,. When M2 closes, the comparator produces a logic "1" if V, > VAf and a logic " 0 if V, < Vbf.
C. Analog Input Pipeline with Data Filtering
In asynchronous systems such as particle detectors, the Discr input to a TDC arrives at random points in time. The question Given the average and the maximum input rates to a TDC, we can make one of several choices. Assuming we wanted to digitize all the incoming data, we could use a flash TDC where CT < DTAkIIN This condition would guarantee that data never arrives faster than the converter can handle. The disadvantages of the flash converter, however, are its ' Time complexity, a large power dissipation, and a large chip area. In massively parallel data acquisition systems, a very high speed TDC would not necessarily be the most efficient approach. An alternative would be to use an analog memory which stores the incoming data before digitization. This is analogous to a pipelined sample and hold in a voltage sampling A/D converter. We would thus have the more relaxed requirement of C T < DTAVE. On those occasions when C T > DT, the data would simply pile up in the analog buffer until serviced by the AID converter. The TCCAMU uses this approach.
The speed requirements for an input pipelined TDC are easier to satisfy. This approach requires a much simpler A/D with less power dissipation and less chip area. The reduced complexity of the circuitry would allow a number of TDC channels to be placed on a single die substrate for use in a high density front end readout system.
Additional advantages in using analog input buffers have to do with the data filtering. Obviously, it would be much more power efficient to decimate the information in an analog pipeline before digitization. Using the rejection factor, we also see that the converter now would need to satisfy the condition C T < ( S ) x (DTAv,q) where S >> 1.
Note that there will be occasions when the pipeline overflows due its finite size and the finite CT. One can determine the minimum number of storage locations needed in the pipeline using queuing theory or Monte Carlo simulations [6] . The minimum size of the analog buffer will be determined by a number of factors: DTMIN, DTAVE, CT, S, the storage time in the analog pipeline (or queue), the maximum acceptable probability that the pipeline will be full when a Discr input arrives, and of course the probability distribution for the Discr anival times.
D. Two Level Analog Content Addressable Memory
The analog memory in the TCCAMU consists of 12 shielded storage capacitors. The capacitors use the gates of PMOS transistors placed in an isolated n-well. These 12 storage devices are part of a 2-level analog content addressable memory (CAM). The CAM is used to implement the analog input pipeline.
In general, an analog CAM has write, read, and match functions. A write operation stores analog information in unused word locations. A match operation involves a parallel search through all the stored words for a match with the input word. The read operation returns the analog information associated with the matched word [7] .
In the TCCAMU chip, the analog CAM is divided into 2 levels: Level 1 and Level 2. The Level 1 buffer accesses 8 storage capacitors and the Level 2 buffer accesses 4. A data register assigned to each of the analog storage capacitors holds a unique address and ID tag. A capacitor, therefore, is chosen for readlwrite operations according to the contents of its data register. The ID tag contained by register i identifies the level (1 or 2) that capacitor i is associated with, and the address contained by register i is the address within that level.
Because the contents of data register i can be changed, it is possible to "move" the data on capacitor i from one CAM level TlME TO CHARGE CONVERSION to another without actually disturbing the analog information on that capacitor. This is the reason for choosing an analog CAM over an analog RAM.
The two-level analog CAM outlined in Fig. 5 has simultaneous Readwrite capabilities for both levels. During the CAM's Level 1 write operation, the charge generated by a time measurement is placed on the capacitor whose Level 1 address matches the address chosen by the Level 1 Write counter (Fig. 2) . Let this be referred to as address j. After a 1 ps latency, a Level 1 Read operation occurs whereby address j for that capacitor now matches the address given by the Level 1 Read counter. An L1 readout trigger will then determine whether to discard the L1 datum being read (L1 reject) or to accept the datum for further processing (L1 accept). An L1 accept results in the data register contents being changed from a Level 1 address j to a Level 2 address k. The Level 2 Write counter determines the address k that is loaded into the register. This operation of transferring analog data from address j in Level 1 to address k in Level 2 can simply be called a virtual Level 1 to Level 2 transfer. After an undetermined latency, the Level 2 Read counter will choose address k for a Level 2 read operation. A Level 2 readout trigger can then accept or reject that datum. If it is accepted, the analog data at address k will be digitized. The TCCAMU thus has 2 levels of filtering prior to digitizing a measured time interval.
Note that there are several requirements for a Level 1 to Level 2 transfer. First of all, the transfer must be accomplished quickly so that the Level 1 address j can be made available for writing again. Secondly, we do not want the analog information being transferred to be corrupted by the transfer process itself. Thirdly, we should dissipate as little power as possible to minimize the total power of the multichannel system.
A virtual Level 1 to Level 2 transfer satisfies the previous requirements. Instead of transferring the charge itself from one capacitor to another (as in an analog RAM), we have the data registers for the L1 Read capacitor and the L2 Write capacitor swap their contents. As a result, the analog information on a capacitor is never disturbed during a L1 Read or L2 Write operation. Hence there is no need for any high In order to find the analog input versus digital output curve for the TCCAMU, a known input time interval was supplied to the chip, and the outputs were recorded. A Gaussian jitter was intentionally added to the input Discr delay. For each particular input delay, 250 corresponding output samples were taken and averaged over this jitter to get a precision within a fraction of an LSB. With interpolation, the input delay corresponding to the center of an output LSB was found, resulting in the analog input versus digital output plot in Fig. 9 . Note that the timing nonlinearities of the calibrated test system supplying the Discr signals had to be much smaller than the timing nonlinearities of the TDC being measured.
The output shown in Fig. 9 is a sawtooth waveform with a periodicity equal to that of the 16 ns system clock. The input range to the TCCAMU therefore is also 16 ns. By counting the clock cycles, however, it will be possible to greatly extend the dynamic range.
Note that at the sudden low to high output transition the width generators must decide whether to generate a maximum or a minimum width pulse. A metastable state at this transition exists where the width generators cannot decide which pulse width to create. This state of indecision will occur when a Discr input lands in a very narrow time window centered at that transition. Measurements of the TCCAMU output did not reveal any metastability near this region, however, meaning that the width of the metastable time window must be << 1 LSB. The narrow width of this window results from the use of a positive feedback loop in the width generators. According to SPICE simulations, the width of the metastable window should be < 10 ps.
The slope of the output curve in Fig. 9 is -1 LSBl107 ps.
Because of the independence from capacitor and integrating current mismatches, the channel to channel (i.e., chip-to-chip) variations in the slope are <0.2%.
The integral nonlinearity (INL) can be defined as the difference between the measured output data and the best fit straight line through that data. The INL was calculated from the curve in Fig. 9 . The typical INL for any given analog storage location is < 200 ps, with a typical rms INL of 90 ps rms (see Fig. 10 ). The overall pattern of the INL curve is very similar from capacitor to capacitor, and even chip-to-chip. It is believed that this pattern noise stems from on-chip coupling effects between sensitive signal paths.
The random noise referred back to the input Discr consists of quantization noise and jitter. The quantization noise of the TCCAMU is calculated to be -31 ps m s . The jitter of our time to digital converter, due to thermal noise sources, was measured to be -25 ps rms. Combining both factors gives a total TCCAMU input referred random noise of 40 ps rms.
The maximum pedestal offset between any 2 analog storage locations in a given chip is < 8 LSB's. The offsets, however,
are not due to any mismatches in the capacitors themselves. Rather, the offsets result from a combination of mismatches in the input and output switches of the storage capacitors, and the large parasitic capacitance of the common bus connecting those switches. Efforts are currently underway to reduce these offsets to an expected value of -1 LSB.
IV. CONCLUSION
A time to charge converter with an analog memory unit (TCCAMU) and an on-chip AID converter has been successfully designed, fabricated, and characterized. This IC measures the delay between the leading edge of an asynchronous Discr signal and a following edge of a 62.5 MHz system clock. The analog information from the time to charge conversion is pipelined in a two level analog CAM (content addressable memory). The data in Levels 1 and 2 of the CAM are filtered by externally generated Level 1 and Level 2 readout triggers.
After a Level 2 accept, the analog information is digitized.
The TCCAMU was fabricated in HP's 1.2-pm n-well, double metal process. This 2.0 mm x 2.2 mm circuit was verified to be capable of measuring and digitizing its entire 16 ns input range with a -107 ps / LSB resolution. Despite its 4-ps conversion time, this chip achieves a double pulse resolution of 16-32 ns by using an analog input pipeline. offsets to --1 LSB.
3
In summary, the TCCAMU's high performance and low
1:
power dissipation make it suitable for massively parallel data (1)
The width of an output LSB can be found with a code density test. By randomizing the TDC analog input a histogram of the output LSBs can be produced, and from this one can statistically determine the width of each code. The probability distribution for the random inputs is often chosen to be a constant over the entire input range. In our case, however, a Gaussian distribution was used for the code density test. This was easily accomplished by adding a finite amount of jitter to the Discr anival time in the test setup.
The total jitter U J occurring at the TCCAMU output can be written as the sum of the jitters in the Discr, system clock, and the TCCAMU itself:
The jitter that was added to the Discr was made to dominate the contributions from the Clock and the TCCAMU. In 
.).
For each particular output LSB whose width was being measured, 1000 input samples were generated. From this, we were able to calculate Prl and Pr2. Note that Prl in (3a) is the measured probability in the tail of the Gaussian occupied by the LSB's Y -1, Y -2, . . . and so forth. Prz is the measured probability in the tail corresponding to the LSB's Y + 1, Y + 2, . . . and so forth. We can express these probabilities as:
where the summation is over all possible output codes. Note here that the % error in determining OJ is much smaller than the O/o error in the measurement of a single code width. The other aspect when making DNL and INL measurements with a Gaussian input signal is to verify that the jitter due to (T J has a true Gaussian shape (or nearly so). If we refer to (5) and take the derivative dldyg of both sides, we find that: 0.8 (8) tells us that if the total output jitter CJ is Gaussian, and if we vary the input Discr delay over a small range to get a number of (yl, 72) pairs for a given output code, then these points should all lie on a straight line with a slope of -1. 7' 1 the measurement error. There were 1000 output samples taken for each data point shown. 
C. Measurement Uncertainty in DNL
In order to find the measurement uncertainty in the DNL, we must find the uncertainties in y and Pr. Let:
where p is the center of the Gaussian shown in Fig. 11 , and X2 -X I corresponds to the range of input Discr arrival times that produces the output code Y. The y l and y2 are given by:
Using a lookup table for the erfc function, we get the numbers yl and yg which correspond to the measured Prl and P r 2 Combining (4a) and (4b) we get the measured width of code Y:
N =
number of input samples generating the Gaussian, Pr = measured probability in tail of Gaussian, gp = Std dev of Pr from the actual probability, 
If we now take the derivative d l d y of both sides of (3) using the Leibnitz Rule, assuming that ap is sufficiently small Thus, by measuring Prl and Pr2 we can find yl and y2, (i.e., N large), then we can write: which in turn give us the width of a particular code. Note that these equations assume that the code Y being evaluated y2 '71% &up e~p (~) . has the Gaussian center p located between X1 and Xg. By (10) extending this theory it should be possible to measure several codes at a time. Finally, from (5) and (6) we see that : Finally, the measured differential nonlinearity of code Y is given by: ffWidth = ~D N L = '75'712 < 2'7Jffl.
(1 1) DNLy = aJ(yl + y2)Y -1 LSB (6) For N = 1000 and 0.5 > Pr > 0.05, we use (3), (9), and where a~ is in units of LSB's.
(10) to find that crl < 0.066. In our measurements, we added a jitter to the Discr such that CJ zz 0.6 LSB. Using (11) we B. Width and Shape of Output Gaussian finally have :
In order to make use of (6), we must know the value of a J . The value of O J can be written as : UJ = # TCCAMU Outpllt Codes/ x ( y l + y 2 )~ (7) Increasing the value of N will decrease the measurement Yuncertainties in both the DNL and the code widths.
