: A 64-channel mixed-mode ASIC, suitable for particle detectors of large dynamic range and high capacitance up to hundreds of pF, is presented here. Each channel features an analogue front-end for signal amplification and filtering, and a mixed signal back-end to digitise and store the signal information. The analogue part consists of a low input-impedance programmable gain pre-amplifier based on a regulated common-gate (RCG) input stage, two shapers optimised for time and energy measurements. The back-end part mainly includes discriminators, TDCs and ADCs, which are used to process the signal and encode both the time of arrival and the charge in the input signal with a fully digital output. The programmable gain of the front-end (up to 400 fC input dynamic range) and the versatile back-end allow the readout of different gaseous detectors like GEM, MicroMEGAS and MWPC.
Introduction
Gaseous detectors featuring high rate capability, good time and spatial resolution and low cost, are widely used in high energy physics experiments and medical imaging [1] . The simplest gaseous detector can be regarded as two parallel plates applied with different electric potentials, filled with gas medium inside. Charged particles crossing the detectors interact with the medium, ionizing the gas and producing the primary charges. While the charges drift in the applied electric field, they are multiplied by a factor of about 10 4 − 10 6 , depending on the detector technology. Finally the signal is induced on the electrodes and typically readout by dedicated electronics. Generally, the output of this kind of detectors can be modelled as a current pulse in parallel with a capacitor [2] of about tens to hundreds of pF.
This paper describes the design of a versatile mixed-signal front-end ASIC for the readout of a wide range of detectors. Designed in an area of 5 x 5 mm 2 , this chip with 64 parallel channels features a full chain readout for gaseous sensors providing amplification, signal conditioning and discrimination, and provides a data payload containing the channel ID, the time stamp and charge information for each event. The programmable gain and input impedance of the front-end amplifier allows to match the requirements of different detectors. The chip has been fabricated in UMC 110 nm CMOS technology and operates at 1.2 V power supply. Table 1 shows the key features of the ASIC. This paper is organised as follows : Section 2 gives an overview of the chip structure. Section 3 describes the architecture of the pre-amplifier, including mainly the transfer function and noise analysis. Section 4 makes the description of the shapers and discriminators. Section 5 introduces the back-end part, including the TDC, S&H and ADC working principles. Section 
Overview of the ASIC architecture
The development of this chip was done in parallel with that of the TIGER ASIC developed for the readout of a Cylindrical Triple-GEM detector, in the framework of the BESIII Inner Tracker upgrade program [3] [4]. The re-use of key IPs between the two ASICs, such as the Time-to-Digital Converters, the DACs and of most of the control logic shortened the design time, while the sharing of the same dedicated fabrication reticle allowed for a significant cost reduction. For a detailed review of the TIGER ASIC and associated on-detector electronics the reader is referred to [5] . Figure 1 shows the block diagram of one channel. The signal from the detector is firstly read out by a Regulated Common Gate (RCG) pre-amplifier, which works a current conveyor and provides programmable gain and input impedance. The current output signal is then split into two branches: the timing branch consists of a fast shaping TIA (Trans-impedance Amplifier) with a peaking time of about 60 ns used for accurate timing measurements, while the energy branch has a slower shaper with a peaking time of about 170 ns to minimise the equivalent noise charge (ENC). The output signal of the timing branch amplifier is fed to a fast leading-edge threshold voltagemode discriminator, generating a trigger signal which is used for the time-to-digital conversion of the crossing time. The low-power Time-to-Digital Converter (TDC), based on time interpolation, uses up to four Time to Analogue Converters (TACs) and one Wilkinson Analogue-to-Digital Converter (ADC).
The circuit allows for two different methods to be used for the charge measurement: Sampling and Hold (S&H) of the voltage signal at the output of the energy branch, or a time-based readout of the Time over Threshold (ToT). The S&H circuit samples and holds the peak voltage from the slow shaper output, which is then digitised by a Wikinson ADC. Alternatively, the ToT method is implemented using two TACs to record the time stamps of the rising edge and falling edge. This method, despite its intrinsic non-linearity when using CR-RC filters, is a versatile solution for the energy measurement in case the input charge exceeds the dynamic range of the S&H circuit. The trigger signal for the S&H circuit and the rising edge time-stamp for the ToT can be generated both by the leading-edge crossing of the fast or slow shapers. Similarly, the falling edge for the ToT measurement can be selected either using the fast or the slow signal branch.
Control logic in each channel handles the operation of the back-end digitisation circuitry. This digital core operates at 200 MHz and manages the TACs, TDC/ADCs and data/control interface with the chip global back-end. The input stage is based on a common gate topology with g m -boosting and works as a current conveyor. This regulated common gate amplifier topology allows for the realisation of a controllable very low input impedance front-end [8] .
Versatile Front-End Amplifier Design
A programmable gain stage, shown in figure 2, is implemented with a configurable parallel connection of the output PMOS of the current mirror. The series switches s 1 , s 2 and s 3 allow for 8 programmable gain settings. The programmable gain stage (figure 2) is replicated for the fast and slow branches, and the current-mode output signal is fed to each one of the shapers. Each channel features a 6-bit DAC and a 5-bit DAC to set the currents in common gate (Bias1) and g m -boosted stage (Bias2) respectively. This allows for a configuration range of the bias current in the order of 2.5 µA to 10 µA in the common gate stage, and 0.1 mA to 3.3 mA in the g mboosted stage. In the RCG circuit, the input transistor NM1 is in a common gate configuration, and NM2/PM0 implement the common source amplifier used to decrease the impedance seen at the source of NM1. The small signal equivalent circuit of a generic regulated common gate amplifier is shown in figure 3 . Here, we use C d to account for the sensor capacitance and the parasitic capacitance of the transistors at the input node which, in the scheme depicted in figure 2, is mostly given by the gate-source capacitance of NM2. C gs1 is the capacitance between the gate and source of NM1, the common gate transistor. The open-loop gain of the common source amplifier is given by Eq. (3.1):
where g m2 is the transconductance of NM2, and r o2 , r op0 are the output resistance of NM2 and PM0, respectively. Likewise, we define g m1 and r o1 as the transconductance and output resistance of the input transistor NM1.
We can apply the KCL (Kirchhoff current law) at the input and output nodes to obtain the equations (3.2), (3.3) and (3.4):
and,
where R L and C L are the lump load impedance connected to the drain of NM1. From (3.2) and (3.3) we can use the following approximation:
Assuming high-capacitance detectors, the following approximation is also valid:
As a consequence, we can obtain a simplified transfer function:
and the relation that defines the input impedance becomes:
Generally, the two main poles of the g m -boosted common gate amplifier are defined at the input and output nodes (VS1 and VD1 in figure 2 ):
Since the frequency of input pole is A times higher than the ordinary common gate topology, the RCG input-stage is suitable for the readout of sensors with high capacitance. A third pole τ R , introduced by the RC time constant seen at the node VG1, defines a frequency dependent gain of the common source stage:
The equation (3.10) is a revised relation of (3.1), considering (3.10) and also the effect of C gs1 . Starting from equations (3.2) and (3.3), we can write the complete transfer function:
The denominator thereby is comprised of a second order polynomial, which may have complex conjugate roots. To avoid the complex conjugate roots, the following relation is necessary:
which can be rewritten as:
From 3.13 we can define the minimum stability margin, corresponding to the relation:
If the sensor capacitance is sufficiently high, with the approximation C d >> A 0 C gs1 , (3.13) becomes:
On the other hand, when the sensor capacitance is small, we can obtain:
Interestingly enough, according to the (3.15) and (3.16), the RCG circuit is able to avoid the complex conjugate roots when the sensor capacitance is either very small or very high, while a transimpedance amplifier may suffer from instability when the C d is very large [2] . Therefore, the RCG amplifier is particularly suitable to achieve a fast readout for sensors with very large terminal capacitance. Furthermore, for intermediate values of C d , small values of τ R and g m1 are preferred in order to avoid possible complex conjugate roots in (3.11).
In order to verify this hypothesis, we perform an analysis based on the simulation of the simplified schematic shown in figure 4. An ideal current source is employed to provide the common gate current (I cg ), and C l is used to model the load capacitance. The stability simulation of the close loop formed by the boosted stage (implemented by a limited bandwidth amplifier) is carried out, where A is defined by equation 3.10 using A 0 = 50, τ R = 1kΩ * 2pF = 2ns, C gs1 = 1pF. Table 3 summarises the simulation results in terms of the phase margin obtained at different bias conditions of NM1, as a function of the input capacitance. The results are compatible with the analysis above. Setting a smaller common gate current can increase the phase margin for intermediate values of C d .
The two dominant sources of electronic noise in MOS devices are flicker and thermal noise. Flicker noise can be modelled with a voltage source series-connected to the gate of the transistor, and expressed as: where K is a constant given by the process, C ox is gate oxide capacitance per unit area, WL is the gate area. Its contribution is minimised by choosing a proper area for transistors and decreasing the transconductance of the current sources. Thermal noise, which spectral density in MOS devices can be represented through a resistor analogy, is given by the general expression of (3.18):
where k is the Boltzman constant, T is absolute temperature, and γ is a complex function of the basic transistor parameters and bias conditions, with a typical value of 2/3 or higher. Considering as prominent the thermal noise contribution from NM1 and NM2, we can write:
where the I 2 n2 is the current mode noise defined by (3.18). For NM1, since its transconductance is boosted by the factor of A, one can define the noise voltage as:
Assuming that A 2 g 2 m1 >> g 2 m2 , we conclude that NM2 will become the dominant source of thermal noise, and we can derive the noise contribution to output of NM2:
From (3.19) we infer that increasing the g m2 will considerably decrease the overall noise of RCG, which justifies the need of using a quite large size of common source transistor (NM2). Figure 5 shows a CAD layout detail of the front-end of a single channel, which highlights the large silicon area of the g m -boosting transistor NM2. 
Shaper stages and discriminators
The simplified schematics of the two shapers are illustrated in figures 6 and 7. The feedback resistors and capacitors are carefully designed in order to minimise the spread in key parameters, like peaking time and gain, due to statistical device mismatch and process variations. In the timing branch a conventional CR-RC shaper is employed. In the energy branch the feedback uses a pair of complex conjugate poles, whose impulse response has a better approximation to Gaussian shape, thereby resulting in a lower noise (by increasing the peaking time) for a comparable rate capability.
The shaper cores share the same structure, using a single-ended input stage and a class-AB output Both shapers employ a Baseline Holder (BLH) structure, whose working principle is mainly based on a very low frequency feedback [11] , to set a defined output baseline. Figure 9 shows the transistor level design. The voltage outputs of both the fast and slow shapers are used at the input of a fast discriminator that generates CMOS-level trigger signals the channel control logic. The two discriminators share the same structure and the transistor-level schematic is shown in figure 10 . The bias current of the differential input amplifier is controlled by V b1 and that of the output stage is set by V b2 . Both voltage bias are set by a 6-bit DAC at the periphery of the chip, and their value is thereby common to all 64 channels. The global setting V hyst is configured by a 3-bit DAC for an adjustable hysteresis amplitude. 
Time and Amplitude Digitisation Circuits
The time measurement is performed by 2 low-power TDCs based on analogue interpolation [6] . A set of 4 Time-to-Amplitude Converters (TACs) per TDC are used, allowing for the de-randomisation of the time-of-arrival of the events. For each event, the TAC generates and stores a voltage signal that is proportional to the time difference between the trigger and a known leading edge of the system clock. This voltage information is subsequently transferred into a second capacitor C T DC and processed by the Wilkinson ADC, while the buffer is reset to an idle state. Any trigger occurring during the conversion time of the TDC will be processed by the next buffer in the queue, following a round-robin scheme for assignment. Any event occurring while all 4 buffers are occupied will be discarded.
The block diagram of the multi-buffered TDC is illustrated in figure 11 :
In the event of a trigger, the switch S 1 closes and the current source I T AC (25 µA) discharges the C T AC (0.5 pF) until the next clock cycle rising edge. A synchronous finite-state machine (FSM) closes the switch S 2 , transferring the voltage stored on C T AC into C T DC (2 pF). In order to cope with the RC constant created by the R on of the CMOS switch, this operation takes 20 clock cycles. The C T DC capacitor is thereafter recharged with a smaller current I T DC (0.78 µA) until V T DC reaches the steady-state voltage V r e f , which is the working principle of the 10-bit Wilkinson ADC. Thereby, the time is interpolated by a factor of 128, considering the following design parameters:
A system clock of 200 MHz provides a TDC time binning of 40 ps (LSB = 5 ns / 128). The fine counter (T-fine) information is convoluted with a 16-bit time stamp (T-coarse) provided by a global binary counter, which state is distributed to the channel, running at the chip clock frequency of 200 MHz: T-coarse and T-fine together provide the time stamp information. When the conversion is completed, the voltages on C T AC and C T DC are reset to the reference value (V r e f ) by the control logic.
The conversion time defines the event rate that the TDC can handle, and is therefore a function of both operation clock and the interpolation factor. The design specification of a TDC capable of providing a time binning of 40 ps is driven by the fact that we expect, based on simulation results, the intrinsic time resolution of the front-end to be better than 500 ps and 300 ps r.m.s. for an input charge of 50 fC and considering, respectively 100 pF and 10 pF input capacitance. In these conditions, we expect the quantisation error of the TDC, which adds quadratically to the full channel intrinsic time resolution, to have a negligible contribution.
Two different charge measurement modes are implemented in the chip: ToT (Time-overThreshold ) and S&H (Sample and Hold) mode. In ToT mode both the rising and falling edges of the discriminator are digitised by the TDCs and the charge information can be extracted from the pulse duration. The ToT measurement can be performed on the output of either the Timing branch or Energy branch. Figure 12 shows the basic process of S&H mode for charge measurement. The S&H circuit records and holds the peak voltage of the signal from the slow shaper on a capacitor. The configurable sampling time window is managed by the channel control logic, with the start provided by the discriminator of the fast branch (due to the smaller time walk). The voltage stored on the capacitor is then digitised by the Wilkinson ADC of the energy branch which is shared with the TACs, providing a linear measurement of the input charge. Similarly to the method adopted by the TDC, each branch employs four S&H buffers allowing for events de-randomisation. Figure 13 shows the CAD ASIC layout and the silicon chip wire-bonded to a test board. It is configured, controlled and readout using a commercial FPGA board over standard LVDS links.
Characterisation results
A test pulse can be generated using on-chip calibration circuity or injected externally with a pulse generator and a C-R circuit. The internal test pulse circuitry is implemented in the chip periphery and uses either a trigger signal generated by the global control logic or a digital test pulse fed directly from an external trigger generator. The circuit generates a voltage step function which amplitude is configurable using a 6-bit DAC. The voltage pulse is propagated to the channel under test, and a current-mode signal is generated locally by each channel enabled for calibration. Figure  14 shows the measured peak amplitude at the output of the fast shaper for several gain settings ranging from 10 mV/fC ("set1") down to the minimum 1.2 mV/fC ("set8"). Table 4 shows a summary of the gain measurement characterisation results. The mismatch of experimental vs. simulated results at the minimum gain settings, which correspond to the worst Integral Non-Linearity (INL) results, is caused by a dynamic modulation of the V sg of the PMOS devices on the amplifier output stage, which drives these MOSFETs into linear region. Although this non-linearity could be corrected offline after calibration in this first prototype, a design fix will be required in the final version of the chip. A cascoded topology would increase the output resistance of the current mirror, enhancing the linearity of the circuit.
The noise measurement is performed by scanning the amplitude of a fixed charge test pulse with a variable discriminator threshold level V th . The curve consisting of triggered counts can thereafter be analysed by fitting with an S-curve, which the slope is a direct measurement of the noise. Figure 15 shows the characterisation result of the noise as a function of the input capacitance in the timing branch, compared with the post-layout simulation. In these tests, the capacitive load at the input was forced with an external capacitor on the test board.
The measured noise is higher than expected by a factor of 20%. This excess might be due to interference noise from the test environment and power supply. In order to study the channel intrinsic time resolution, a sequence of test pulses synchronised to the FPGA system clock are used for the injection of a calibrated charge to the front-end. The σ of the Gaussian-Fit of the measured time distribution is a direct measurement of the jitter. Figure  16 plots the measurement and simulation results of the timing jitter as a function of the input capacitance, in the condition of an injected charge of 14 fC and gain setting of 10 mV/fC. The test was repeated by scanning the phase of the trigger signal in respect to the clock in steps of 135 ps (38 points on a 5 ns period).
The simulation values are obtained using the following function:
where Slope is the slew-rate of the leading edge of the timing shaper output at a fixed threshold, and rmsNoise is the total output r.m.s. noise voltage, both obtained with the simulation of a postlayout netlist. The systematic mismatch of simulation versus experimental results is not fully understood and further investigation is needed. The excess of noise in test data is independent of the input capacitance, but we were not able to replicate the same conditions simulating the post-layout simulations. Thereby, this systematic offset of 440 e − r.m.s. could be related to digital interference noise at the level of the discriminator circuit. The charge measurement, as aforementioned, can be performed using ToT (Time-over-Threshold) or S&H (Sample and Hold). The electrical characterisation is performed injecting a test pulse with different charges. Figures 17 and 18 show the results of a charge measurement with ToT and S&H modes, and both in the gain "set1" and "set8", respectively. The digitised output of the S&H is converted to the analogue peak voltage according to the calibration.
Experimental data obtained in S&H method show a INL better than 1% in both the minimum and maximum gain settings. Nevertheless, the non-linearity in minimum gain conditions worsens above 300 fC, for reasons that were already discussed earlier in the Section. The residuals of the linear fit to the characterisation data is shown in figure 19 . For the ToT mode, the results are shown in figure 17. The inherent non-linear ToT versus Qin behaviour of the front end requires a 3rd order polynomial fit or an offline Look-Up Table for the charge reconstruction. Despite the advantage in terms of higher dynamic range of the ToT method, the need for a linear fit only makes the usage of the S&H mode more advantageous, since it does not require an offline Look-Up Table. The Table 5 provides a brief summary of test results.
Attributes
Test Results Power Consumption 9 mW/ch INL < 1 % (up to 300 fC) Dynamic Range up to 400 fC Gain 1.8 to 12 mV/fC (E-branch) ENC @ 100 pF 3500e − Jitter @ Q in = 14 fC, C in = 100pF 4 ns 
Conclusions and outlook
We developed and present the design and electrical test results of a versatile 64-channel mixedsignal ASIC, compatible with the readout of high-capacitance sensors, providing the time stamp and charge measurement of each event. This chip was produced in a UMC 110 nm technology engineering run, sharing the reticle with the TIGER ASIC [5] , which was developed for the readout of the CGEM Inner Tracker detector for the BESIII Upgrade.
The intrinsic time resolution is better than 4 ns r.m.s. for an input charge of 14 fC with a 100 pF detector capacitance. The charge measurement is performed by two alternative modes : ToT and S&H mode. The dynamic range up to 400 fC allows for the use of this ASIC in a wide number of gaseous detectors, while the low-impedance front-end maximises the PSRR and reduces the susceptibility to external interference noise. 
