The new TRacking Array for light Charged particle Ejectiles (TRACE) detector system requires monitorization and sampling of all pulses in a large number of channels with very strict space and power consumption restrictions for the front-end electronics and cabling. Its readout system is to be based on analog memory ASICs with 64 channels each that sample a 1 µs window of the waveform of any valid pulses at 200 MHz while discarding any other signals and are read out at 50 MHz with external ADC digitization. For this purpose, a new, compact analog memory architecture is described that allows pulse capture with zero dead time in any channel while vastly reducing the total number of storage cells, particularly for large amounts of input channels. This is accomplished by partitioning the typical Switched Capacitor Array structure into two pipelined, asymmetric stages and introducing FIFO queue-like control circuitry for captured data, achieving total independence between the capture and readout operations.
Introduction
The TRacking Array for light Charged particle Ejectiles (TRACE) [1, 2] is a new telescope detector system for the discrimination of particles and light ions in fusion evaporation and direct nuclear reactions, designed to work 5 in combination with a large gamma tracking array like AGATA [3] . Each detector cell is a ∆E-E telescope consisting of a double silicon layer with respective thicknesses of 200 µm and 1 mm, forming a 12 × 5 array of pads with a 4 × 4 mm 2 pitch, and a resistivity of 20 kΩ/cm. Identifi-
10
cation of different ions and particles relies on both ∆E-E discrimination and pulse shape analysis (PSA) [4] based on the sampling of all detector pulses generated at the silicon pads. The first experimental tests have already taken place with temporary readout electronics using commer-
15
cial DAQ modules and only a limited number of detector channels [5] . It has been established that acquisition windows of 1 µs at 200 MHz sampling frequency or higher are
The final readout system for TRACE requires monitorization of all detector channels and sampling all generated pulses, which may appear in any channel; specifically, one front channel per silicon pad with hole signals is required for particle discrimination, plus one back channel 25 per layer with electron signals that will be used mainly for spectroscopy. An event rate in the range of tens of kHz is expected, and an energy resolution below 1 % at 5 MeV is to be obtained at room temperature. The complete frontend circuitry for each detector needs to fit in a 25×50 mm 2 30 circuit board, with a similar size as the detector itself. This imposes very strict space and power consumption restrictions for the front-end electronics and cabling. A total of 122 channels (120 front and 2 back) need to be read out per detector and captured information should be transmit-35 ted serially in order to keep cabling to a minimum. At the same time, heat sinking is limited by the small dimensions of the vacuum reaction chamber, and a maximum power budget around 20 mW per channel is estimated.
The aforementioned specifications strongly favor the Figure 1 : Simplified schematic outlining the location and role of the analog memory ASIC and its interface with other system components.
front-end while avoiding power-hungry free running digitizers. A triggerless readout scheme has been devised that relies on an analog memory ASIC with self-triggered channels for the sampling of detector pulses and their later re-
45
transmission at a slower pace. For this purpose, a novel analog memory architecture based on multiple Switched Capacitor Array (SCA) stages is presented that has been specifically designed to minimize detector dead time and area occupation per channel. This paper describes the gen-50 eral readout scheme, the conceptual design of the analog memory ASIC, and its current design status.
TRACE readout scheme
The proposed readout scheme is depicted in Fig. 1 and involves two types of ASIC for the front-end electronics.
55
The ASIC dies will be directly wire-bonded to the frontend PCBs for minimal area requirements.
The first ASIC type, described in [8, 9] , includes chargesensitive preamplifiers designed for low power and low area occupation by requiring only one external component per 60 channel. Their operating point settings can be individually tuned through an I 2 C interface for maximum bandwidth, in order to comply with PSA requirements. Smart fast-reset circuitry is included for each channel that allows fast capacitor discharge in case of saturation by con-necting a constant current sink. Moreover, this discharge mechanism allows an extension of the dynamic range for energy estimation on the back channels beyond saturation by implementing time-over-threshold (ToT) measurement [10] ; time-to-amplitude converters have been added 70 to these channels for direct amplitude spectroscopy.
The second ASIC model is an analog memory circuit whose purpose is to detect any pulse signal coming from the preamplifiers and sample its waveform while discarding any other signals. Up to 32 samples are to be captured before the actual pulse edge for baseline estimation, plus 192 samples corresponding to the pulse waveform, for a total capture window of 1.12 µs at a 200 MHz sampling rate.
Moreover, for the back channel, the final, stable output from the time-to-amplitude converter needs to be sampled 80 as it provides the energy measurement in case of saturation. Captured signals are timestamped and transmitted serially to the back-end by means of a single analog output; they are digitized remotely and processed by a FPGA that controls the readout process at 50 MHz.
85
The analog memory ASIC is also responsible for communication with the Global Trigger and Synchronization (GTS) subsystem to be used in conjunction with AGATA [11] . In particular, trigger request signals have to be issued for every detected pulse, and the sampling clock is 90 generated from the 100 MHz clock given by the GTS leaf by means of a common clock distribution network that provides phase-aligned, 50 % duty cycle clocks to the analog memory ASICs for all detectors. Both clock edges are used for sampling. Each ASIC contains a local 36-bit 95 timestamp counter for pulse tagging, and counters from different ASICs may be aligned by resetting them at the same time. A synchronization procedure is foreseen in order to match the internal ASIC timestamp counter with the global timestamp counter in the GTS infrastructure.
100
For the preamplifier ASICs, multichannel prototypes with four front channels plus one back channel have already been fabricated in 0.35 µm CMOS technology and are pending experimental validation, and future versions with up to 12 channels are planned. The analog memory
105
ASICs are currently under design using 0.18 µm CMOS technology for reduced power consumption and area; the final design is planned to host 64 input channels, so that two of them would be required per detector element. Their architecture is described in detail in the next section. 
Analog memory ASIC
The architecture and principle of operation of the analog memory ASIC is based on the use of SCAs for the fast analog sampling of transient signals that are stored as charge in internal capacitors and can be later read out at 115 a slower pace. These devices have been used as low-power substitutes for flash ADCs for the past 25 years; some of the most representative examples are described in references [12] [13] [14] [15] [16] [17] [18] [19] . SCA channels are the basic building blocks of the presented ASIC architecture, so a general descrip-120 tion is included here first.
Switched Capacitor Array channels
The particular SCA channel structure employed in this ASIC is depicted in Fig. 2 , based on the use of a common operational amplifier to assist in writing and reading oper- values are then sampled using an external ADC. The SCA channel is locked for writing until readout has been completed, so as to avoid overwriting the capacitor contents. In most analog memory designs, this induces a dead time in the channel which may be very long compared to the 150 pulse acquisition window, since f r is typically much lower than f w . The next section will describe a way to circumvent this issue. The settling time of the amplifiers driving the writing process, in particular the preamplifier driving the bus in-155 put signal, must be low enough that the input voltage may effectively be stored in the capacitor during the time interval when the write switches remain closed. In order to ease the specifications on settling time and increase the tracking window, switches from consecutive capacitor cells are 160 driven by the positive and negative edges of the sampling clock, respectively, so that the sets of odd and even cells are equivalent to two parallel subsections sampling the incoming signal at a 100 MHz rate with a phase difference of π, effectively providing a 200 MHz sampling rate. 
Pipelined Asymmetric SCA
Typical analog memory circuits merely replicate the single channel structure described above for every input channel. A different approach is adopted here: the SCA is pipelined into two sequential, asymmetric stages connected 170 through a full-mesh switching matrix. The general scheme is shown in Fig. 3 , where the circuit has been divided into a first memory stage with a 32-cell SCA channel intended for pre-trigger samples for each ASIC input, and a second stage with 8 slots, each containing a 192-cell SCA chan-175 nel for post-trigger samples and an auxiliary 32-cell SCA channel intended as a storage buffer for the samples in the first stage.
Initially, the second SCA stage is idle, and each channel in the first stage is continuously sampling the asso-180 ciated input signal, so that it contains the last 32 samples at any given moment. Whenever an input channel is triggered, the corresponding channel i in the first stage is write-locked and its samples are held; a free slot j in the second stage is then assigned and both are connected to-185 gether through the switching matrix. At this moment, the input signal is connected to the write bus of the 192-cell SCA, so pulse capture continues there. At the same time, the contents of channel i are sequentially read and copied to the 32-cell buffer in slot j, so that the data transfer 190 is complete before capture of the 192 post-trigger samples ends. At this point, the input channel is immediately ready to start sampling again; therefore, no dead time is introduced. The whole captured pulse is stored in slot j, and it remains locked until it is read out sequentially. The among all input channels. The reduction factor depends on the dimensioning of the whole memory and is better for more asymmetric designs.
A second advantage is the lack of readout-related dead time for single channels, which is a novel feature to the best 210 of the authors' knowledge. The analog memory ASIC exhibits dead time only in the case when the output queue is completely full; in that case, all input channels are locked. Nevertheless, a relatively low event rate is expected in TRACE, making this an unlikely situation whose prob-215 ability may be estimated and used for dimensioning.
This architecture also presents some disadvantages compared to the use of full channels for every input. One of them is a slight loss of flexibility, in that the maximum amount of pre-trigger samples and of simultane-220 ously stored pulses is lower. Another one is the fact that pre-trigger and post-trigger samples are processed along separate paths with different responses; specifically, pretrigger samples undergo an extra copy operation when being transferred from the first into the second stage, whereby 225 additional noise is introduced. The signal-to-noise ratio for these samples may therefore be slightly lower than for the post-trigger section. However, pre-trigger samples will be mainly used for estimation of constant voltage levels (either baseline or ToT output from back channels) so the 230 impact of this SNR difference will be largely diminished. In any case, both signal paths need to be characterized separately, which adds complexity to the calibration procedure.
Input stages

235
A schematic of the input stage for each ASIC channel is depicted in Fig. 4 . It consists in an inverting amplifier with gain −R 2 /R 1 that adapts the signal range between the preamplifier output and the SCA; R 2 is internal to the ASIC and fixed but R 1 is external and can be used 240 to adjust the gain. The inclusion of at least one external component is required for isolation, since preamplifier pulses exhibit a dynamic range of 2.6 V which is already higher than the 1.8 V power supply for the ASIC. A global test input is included for calibration purposes that needs 245 its own external resistor.
In addition, the front and back channels provide pulses with different polarity. In order to support them interchangeably, the amplifier input is biased at one of two global programmable reference voltages that provide the 250 adequate operating point for both possible polarities. This introduces a small quiescent current through the resistors whenever V ref and the preamplifier baseline differ, and thus power consumption in the absence of activity; this current can be eliminated by proper tuning of the supply voltages 255 if desired.
Several trigger modes are available and can be configured individually for each channel. The standard trigger condition is leading edge discrimination, i.e. the detector pulse rising edge crossing a fixed, programmable voltage 260 threshold, as sensed by a comparator. Hysteresis is implemented by inhibiting further triggers until a second comparator detects the pulse signal crossing a lower voltage on its way down, in order to avoid false triggers due to noise on the falling edge. A global Trigger Request output signal 265 is activated whenever one of the channels is triggered.
Other trigger conditions are provided by four global, external trigger signals, the sensitivity to which can be programmed independently for each channel in order to implement additional functionality such as synchronization, 270 calibration and triggering from the preamplifiers' fast-reset logic. For the first ASIC prototype, a separate external trigger will be included for a few test channels in order to test different shaping and trigger circuitry using discrete components on the test PCB. This feature cannot 275 be implemented for all channels due to the large resulting amount of pins.
ASIC configuration and readout
The block diagram of the whole analog memory ASIC is outlined in Fig. 5 , with simplified depictions of the input 280 stage and SCA channels. An I 2 C interface is included for control and configuration. Global configuration registers for reference voltages are present, as well as readable status registers with counters for trigger requests and pulses lost due to full queue. In addition, each input channel con-285 tains several local configuration registers including reference voltage selection, sensitivity to triggers, leading edge trigger polarity, and DAC threshold values. Input stages connected to back channels for ToT spectroscopy are configured to be triggered by the external trigger signals pro-290 vided by the preamplifiers. One channel per ASIC will be devoted to synchronization and configured to be triggered externally by the GTS; the captured timestamp will then be used for alignment with the global GTS timestamp.
Each slot in the output queue contains the SCA chan-295 nels for sample storage and additional digital registers: for the pulse timestamp, which is immediately latched on trigger, and for identification of the corresponding input SCA channel and sampling cell position at the time of trigger; these two values are transferred serially from the 300 pre-trigger channel after the captured samples. A dedicated interface is used for the readout of captured events. Readout is timed with a read clock at f r = 50 MHz that is derived from the sampling clock. Event in Fig. 6 , that must be decoded by the receiving FPGA after digitization. During idle mode, i.e. when no pulse information being transmitted, an alternating sequence of zeros and ones is output continuously in order to allow the receiver to tune the ADC sampling point as close as 315 possible to the next edge for improved accuracy. A 4-bit header indicates the start of a new event frame. 64 bits of digital data include the timestamp and identification of the queue slot, input channel and cell position where the trigger was issued; these data are enough to completely 320 identify events and their full source and path through the ASIC for calibration correction. In particular, the pulse timestamps must be used to determine whether different pulses belong to the same event, because event reception latency is not deterministic due to the FIFO queue; in parts including amplifiers, analog switches, storage cells and clock distribution elements, and synthesizable, optimized HDL code for fully digital blocks, i.e. the timestamp counter, input and output channel controllers, readout controller, and I 2 C configuration engine. Distributed, full-custom circuitry has been used as much 345 as possible for digital control of the SCA channels and the switching matrix, in order to reduce the size and complexity of the digital blocks, to limit the impact of control signal routing on area and noise, and to better manage the timing of switch control signals by generating them locally 350 with full-custom circuits. In particular, SCA channel control is based on embedded shift registers with regenerative one-hot encoding of the active cell position, and switching matrix control is based on the propagation of active and full slot flags through input channels, where local trigger 355 signals act on them to activate matrix crosspoints and detect pulses lost to a full output queue.
The complete circuit has been simulated using the final pre-layout circuits for analog and mixed-signal parts and RTL code for digital control blocks. A storage capacitor 360 value C = 270 fF has been chosen as a trade-off between noise specifications, slew rate requirements and timing performance. Parasitic capacitances are one of the key factors limiting the performance of the circuit, so the parasitics of interconnections have been estimated and included in the 365 simulations. Simulated performance parameters are not expected to be very accurate, but enough to validate the design. Current simulations predict a signal bandwidth over 60 MHz, non-linearity below 1 mV and noise slightly below 12 ENOB in the worst case (i.e. the pre-trigger sam-370 ples) on output samples with a dynamic range of ±1.2 V. Figure 7 shows the simulated waveform of an output frame corresponding to the capture of a linear ramp, at the input of the external line driver; the different fields in the frame are visible in the figure.
375
Assessment of the digital control circuitry has been done in a separate, digital testbench in order to validate the timing performance of their synthesized and mapped and routed versions.
Summary and outlook
380
The conceptual design of the readout scheme for the TRACE detector has been presented, based on two ASICs implementing an array of charge preamplifiers and an analog memory circuit, respectively. In particular, a novel analog memory architecture is proposed wherein the typi-385 cal SCA structure is split into two pipelined, asymmetric stages and captured data are stored in an analog FIFO queue, dramatically reducing its area requirements and removing readout-related dead time. While both circuits have been designed with the readout of TRACE in mind, 390 they are also meant to be generic enough that they can be employed for other detectors or applications.
Prototypes for the charge preamplifiers are already available and awaiting test, while the analog memory ASIC is currently in the final design stage and samples are ex- 
