SAMPIC0 is a Time and Waveform to Digital Converter (TWDC) multichannel chip. Each of its 16 channels associates a DLL-based TDC providing a raw time with an ultra-fast analogue memory allowing fine timing extraction as well as other parameters of the pulse. Each channel also integrates a discriminator that can trigger it independently or participate to a more complex trigger. After triggering, the analogue data are digitized by an on-chip ADC and only those corresponding to a region of interest are sent serially to the acquisition. The paper describes the architecture of SAMPIC0 and reports its main measured performance. Measurements on this chip have shown timing performance in the range of 15 ps RMS without correction decreased to less than 5 ps RMS after a simple calibration.
SAMPIC0 is a Time and Waveform to Digital Converter (TWDC) multichannel chip. Each of its 16 channels associates a DLL-based TDC providing a raw time with an ultra-fast analogue memory allowing fine timing extraction as well as other parameters of the pulse. Each channel also integrates a discriminator that can trigger it independently or participate to a more complex trigger. After triggering, the analogue data are digitized by an on-chip ADC and only those corresponding to a region of interest are sent serially to the acquisition. The paper describes the architecture of SAMPIC0 and reports its main measured performance. Measurements on this chip have shown timing performance in the range of 15 ps RMS without correction decreased to less than 5 ps RMS after a simple calibration.
Introduction
Time stamping with picosecond accuracy is an emerging technique opening new fields for particle physics instrumentation. For example, it permits the localization of vertices with a few mm precision, can help associating particles coming from a common primary interaction even in a high background or can be used for particle identification using Time-of-Flight techniques. Most of the high precision systems used for particle detection are based on high performance discriminators followed by TDCs (which can be now implemented in modern high-end FPGAs) with the drawbacks of a high power consumption and a contribution of these two blocks to the jitter. A second approach consists in digitizing the analogue signal using fast ADCs and then using digital treatment to extract the timing. This solution suffers from the high power consumption of fast ADCs and of the high amount of data they produce, making mandatory the use of high-end, high-cost and power consumption FPGAs to acquire them. An alternative solution exists: it has been demonstrated that ps timing accuracy can be reached by sampling the detector signal in ultrafast analogue memories based on Switched Capacitor Arrays (SCA) [1, 2] for reasonable power, space and money budgets. Moreover, the knowledge of the signal waveform permits extracting other useful parameters as charge, pulse width or risetime and optimizing the timing extraction algorithm during or even after data taking. Contrasting with the existing fast sampler chips usually designed for all-purpose application and requiring external electronics to be used for accurate timing, the SAMPIC0 chip, presented here, has been designed specifically for this type of application to demonstrate the concept of TWDC (Time and Waveform to Digital Converter). Figure 1 shows the typical block diagram of a TWDC channel. As in standard-DLL based TDC designs, it associates a counter with a Delay Line Loop (DLL), which input is the clock used for the counter and with a total delay servo-controlled to the clock period (T CK ). In this arrangement, the N -step DLL provides a multiphase clock delayed by steps of T CK /N . In a basic TDC, when a discriminator is triggered by an event its output freezes both the counter output, providing a coarse time, and the state of the DLL, giving a fine time. The full time is built from the association of these two data. In the TWDC architecture, the outputs of the DLL are also used to sample at a N/T CK rate and store the input analogue signal in a Switch Capacitor Array (SCA) analogue memory. This sampling is stopped on a trigger. Various parameters can be extracted from this sampled waveformlike amplitude, charge and a very fine timing which can be extracted by several possible methods (including interpolation of threshold crossing shown in Fig. 1 ), as described in [2] . This last information is combined with the two times provided by the counter and the DLL to obtain a global timing with improved precision. This architecture has many advantages: the timing precision is only set by the one of the sampling; the discriminator is only used for triggering so that its jitter does not contribute to the timing resolution; and at last, the timing precision can be improved by algorithms using several samples of the signal. 
SAMPIC0 architecture

The TWDC concept
Description of the SAMPIC0 chip
The 16-channel SAMPIC0 chip has been designed to demonstrate the TWDC concept suitability for timing but also to be used for measurements on small setup with detectors. Its inputs directly receive the analogue signals coming from the detector, via an AC-coupling located on the board.
As depicted in Fig. 2 , SAMPIC0's main building blocks are:
-one common 12-bit Gray counter, clocked between 16 and 160 MHz, used for coarse time stamping,
-one common 64-step DLL, servo-controlled to the clock period of the aforementioned Gray counter, thus with steps from 100 ps to 1 ns, used for medium precision timing and providing the commands required for analogue sampling,
-one common 11-bit Gray counter, driven by the on-chip clock generator running up to 1. 
Triggering options
As shown in Fig. 3 , several trigger modes are programmable individually for each channel: local, external, central trigger (which is only an OR in the first version of the chip). For each of these modes, the triggering edge can be selected and each channel can be disabled. The POSTRIG can be set to 0, 1 or 2 ns. A common deadtime optional operation using a Fast Global Enable input is also available. 
Sequence of operation
While waiting for a trigger, the signal of each channel is continuously sampled in its SCA, used as a circular buffer. When a trigger -generated as described previously occurs in a channel, the output of the timestamp counter is latched in its coarse time register, the state of the DLL is captured in its DLL register, sampling in its SCA is stopped after a delay defined by the POSTRIG and a signal is sent to the FPGA which can then initiate the A/D conversion. The content of the DLL register is also encoded to get the medium precision time and the position of the trigger in the DLL. Only triggered channels are then in deadtime, the other can still capture new events.
The A/D conversion is performed in parallel on all the cells of the triggered channels using the Wilkinson technique (summarized in Fig. 4) . First, the high-speed clock generator is started together with the ADC counter simultaneously with the ramp generator of each channel to convert. In each channel, the ramp signal is provided to the 64 comparators which other input is connected to the storage cells. For each cell, when the comparator detects the crossing of the ramp with the stored voltage, the output of the ADC counter is captured by an 11-bit register performing thus the A/D conversion. For a fixed ADC clock frequency, the ADC precision and its conversion time are inversely proportional to the ramp slope so that these ADC parameters can be easily programmed: an 11-bit conversion lasts 1.6 ns, while a 9-bit only lasts 400 ns.
At the end of the conversion, the channels are already ready to acquire new events and a flag signal, staying high until all converted data have been read, is sent to the FPGA which sequences the readout operation. Data are read channel by channel, with a rotating priority mechanism to avoid reading always the same channel. As evoked above, an optional RoI readout is available to reduce the dead time (the number of cells read can be chosen dynamically, and the first read cell is calculated from the trigger position in the DLL). Event data are transmitted via a 12-bit parallel LVDS bus, starting with Channel Identifier, Timestamps, Trigger Cell Index followed by the value of the converted cells (all or a selected set) of a given channel, sent sequentially. The throughput of this bus is 1.92 Gbits/s when clocked at the standard value of 160 MHz. It is important to notice that a channel is in deadtime only during conversion or waiting for conversion, but no more during readout, or waiting for readout.
Prototyping and test setup
The 7 mm 2 SAMPIC0 chip, shown in Fig. 5 , has been prototyped using a 0.18 µm CMOS technology from AMS. It is packaged in a small-footprint QFP 128 with 0.4 mm pitch visible in the module photography shown in the same figure. This 32-channel module is based on a mother board which can hold two mezzanines, each equipped with a SAMPIC0 chip. It is today readable through an USB interface and soon by the UDP and optical links already implemented but not yet active. The total module power consumption is only of 5.5 W over 5 V. An acquisition software including a user-friendly GUI and advanced visualization and parameters, like timing, extraction capabilities has been developed to characterize the chip and for use on small size detector setups. All the main functionalities of the chip are working nicely excepted two of them which are not absolutely necessary and can be easily corrected in the next version:
-the RoI readout which fails in some cases. Therefore, we always read the whole depth of the SCAs,
-an identified bug in the central trigger block.
The chip is usable as it is and the waveform sampling is working as expected:
-from 1.6 to 8.2 GSPS on all the 16 channels, -up to 10.2 GSPS on 8 channels (illustrated by the sinewave of Fig. 6 ). The data readout works well up to 175 MHz (> 2 Gbits/s). The combination of the three timing informations to get the overall precise timing is working perfectly with no dead zone or problems due to metastabilities. At last, we were not able to find any evidence of memory cell leakage, even for storage times of few tens of us.
Summary of the measured performance
Many measurements were performed and optimized calibration methods were developed. They will all be described in detail in a future papers later this year. The chip dynamic range is 10 bit RMS with a noise of 1 mV RMS. The timing performance has been extracted using the narrow pulses shown in Fig. 6 (1 ns wide and 300 ps risetime). With these pulses, and using a digital CFD algorithm, the timing resolution is in the 15 ps range without any timing correction and better than 5 ps after applying the calibration and corrections methods described in [3] . The following table summarizes the chip features and performance measured as of today. 
Conclusion
The test of the SAMPIC0 prototype has demonstrated the TWDC concept and has revealed very good performance. This chip is usable as it is and is already in use on detector test benches. An upgraded version is under design including bugs corrections but also improvements like ping-pong operation to reduce deadtime effect or coincidence-based central trigger.
