: The present article introduces a novel ASIC architecture, designed in the context of the ATLAS Tile Calorimeter upgrade program for the High-Luminosity phase of the Large Hadron Collider at CERN. The architecture is based on radiation-tolerant 130 nm Complementary Metal-Oxide-Semiconductor technology, embedding both analog and digital processing of detector signals. A detailed description of the ASIC is given in terms of motivation, design characteristics, simulated and measured performance. Experimental studies, based on 24 prototype units under real particle beam conditions are also presented in order to demonstrate the potential of the architecture as a reliable front-end readout electronic solution.
The ATLAS Tile Calorimeter
TileCal is a sampling calorimeter, constructed of steel plates as absorber and scintillating tiles as active medium, and it is important for the measurement of jet-and missing-energy, jet substructure, electron isolation and triggering (including muon information). The position of TileCal in the ATLAS calorimeter complex can be seen in Fig. 1a . It is divided into four cylinders, two of which form the central Long-Barrel (LB) while the other two constitute Extended-Barrel (EB) partitions, covering the pseudorapidity1 range |η| < 1.7. Each Tile cylinder is made of 64 wedge modules (Fig. 1b) in the azimuthal coordinate. The Photo-Multiplier Tubes (PMTs) with the associated FE readout electronics and high voltage distribution cards are inserted at the outer radius of each module, hosted by a train of two 1.4 m long "drawers" which form a "super-drawer" (one super-drawer can host up to 48 PMTs).
The scintillation light is collected from the two opposide sides of the tiles by wavelengthshifting (WLS) fibres, which are bundled, to define readout cells, and coupled to a pair of PMTs. The current pulse induced at the PMT anodes, with a typical width of 10 ns, is therefore proportional 1ATLAS uses a right-handed coordinate system, centered at the nominal interaction point. The x-axis points towards the center of the LHC ring, the y-axis points upwards and the z-axis points along the beampipe. Cylindrical coordinates (r,φ) are used in the transverse plane, φ being the azimuthal angle around the z-axis. The pseudorapidity is defined in terms of the polar angle θ as η = − ln(tan θ 2 ).
-2 - to the energy deposited in the respective cell and is fed to the FE electronics to be amplified, shaped and digitised at the 40 MHz LHC clock rate. This arrangement of the readout employs a total of 9852 PMTs and segments TileCal into three radial layers (Fig. 1c) , classified as "A", "BC" and "D", with depth 1.5, 4.1, and 1.8 interaction lengths, respectively, at η = 0. Each A-BC-D cell grouping in the same η-direction forms a projective tower currently used for the fast trigger. Additional scintillators (E cells) are installed in the region between the LB and the EB, mainly to measure energy escaping the electromagnetic calorimeter, while at higher |η|, Minimum Bias Trigger Scintillators (MBTS) are used to monitor minimum-bias event rates. The E and MBTS scintillators have been used to monitor the luminosity during van der Meer scans2 [9] and are also useful in electron identification.
2Van der Meer scans enable the measurement of the absolute luminosity of a particle collider by sweeping the beams -3 -To ensure stable measurements, TileCal incorporates three calibration systems. A Charge Injection System (CIS) [10] is used to monitor the response of the FE electronics to known injected charge, but also to derive the conversion factors between the electronic responses and the input charge. Next, the Cesium (Cs) system [11, 12] uses a radioactive 137 Cs γ-source, hydraulicaly circulated through a system of tubes that traverses every row of scintillating tiles. The illumination of the tiles with 0.662 MeV γ produces a uniform, low current signal at the PMT anodes that allows inter-calibration of the detection chains (scintillators, WLS fibers, PMTs, FE electronics). After initial adjustment of the PMT gains, the Cs system is used to measure the small variations among the read-out responses, which are used to derive the necessary calibration coefficients with respect to a unique reference value. Lastly, the laser system [13] injects light pulses to the PMTs for frequent monitoring of the gains between Cs scans.
Upgrade of the Tile readout
In the current Tile readout scheme, the 256 super-drawers are interfaced to back-end Read-Out Drivers (RODs) through 256 optical links (plus another 256 links for redundancy) with a bandwidth of 800 Mbps per link. Each super-drawer stores the digitised samples from the read-out Tile cells in pipeline memories, while analog trigger signals, corresponding to each Tile tower, are transmitted to the off-detector trigger pre-processor. These trigger signals are formed by summation of the PMT outputs, in dedicated analog boards, with rough adjustment in time but without calibration to account for gain variations. Upon trigger acceptance, the data of each super-drawer are forwarded to the RODs.
For the HL-LHC, the TDAQ strategy [8] foresees a fully digital calorimeter trigger with higher granularity and precision. Each super-drawer is replaced by four independent mini-drawers, with half the length of the current drawers to allow easier access to the electronics and improve the reliability of the cooling circuits. At the 40 MHz rate, each mini-drawer transmits the entire set of digitised data to an off-detector PreProcessor (PPr) [14] over two 9.6 Gbps optical links (the full readout scheme employs 2048 optical links plus another 2048 links for redundancy). The PPr stores the data in pipeline memories and, in parallel, transmits calibrated trigger primitives (energy measured in grouped or individual Tile cells) to the trigger system. Upon receipt of a trigger acceptance signal, the reconstructed energy, the time and a quality factor for each Tile cell are forwarded to the data network for event aggregation and storage.
In the new Tile readout system, the present reliable but outdated Tile FE electronics will be replaced by a new architecture that will be able to endure the harsh radiation conditions of the HL-LHC and handle the expected dynamic range of the signal. The lowest expected signal from physics events is defined by the minimum ionisation Landau peak of muons traversing an A cell (smallest Tile cell) at normal incidence, with a most probable value of 350 MeV [15] . Considering the typical electromagnetic (EM) scale constant of 1.05 pC/GeV and the e/µ response ratio of 0.91, the respective charge delivered by a single PMT is approximately 200 fC. On the other hand, the largest expected signal is that of energetic jets, depositing up to 1.3 TeV in EM scale (1.5 TeV in hadronic scale) in a single Tile cell. To cover this range, the maximum charge requirement for one transverselly across each other. Being responsible for luminosity measurements in ATLAS, TileCal must be able to measure very low Minimum Bias event rates.
-4 -readout channel is set to 800 pC. There is however interest in extending the range up to 1.2 nC to be able to measure higher energy jets, expected in rare and possibly new physics events. In addition to the processing of physics signals, each FE electronic card must include a separate channel for large time-constant integration of low amplitude currents. This channel is needed for calibration scans using the Cs system (50 nA − 100 nA) but also for the monitoring of minimum-bias event rates and the instantaneous luminosity (20 pA − 10 µA).
The FATALIC Architecture
FATALIC (Fig. 2) is an ASIC, based on 130 nm GlobalFoundries (GF) Complementary MetalOxide-Semiconductor (CMOS) technology, proposed to replace the current discrete FE readout electronics of the TileCal. The proposal of an ASIC, embedding both analog signal processing and digitisation, is motivated by its significant advantages in terms of simplicity, radiation tolerance, power consumption and production cost for a large number of units. On the other hand, since the operating low voltage (1.6 V) is dictated by the technology, FATALIC must rely on a current-driven, rather than a voltage-driven architecture in order to handle the large input dynamic range. Hence, an input stage employing current conveyors is implemented to adjust the impedance and distribute the signal to the different channels.
The specifications of FATALIC are listed in Table 1 , while a block diagram summarising the architecture is given in Fig. 3 . In order to handle the large input range (up to 1.2 nC), FATALIC provides three fast channels to process the PMT signal at the 40 MHz LHC clock rate, with relative amplification ×1 (low gain), ×8 (medium gain) and ×64 (high gain). This three-gain scheme is chosen, instead of using two gains, in order to improve the energy resolution (at the Tile cell level) across the input range, as shown by the related study in Section 8.3. In parallel, the signal is routed to an additional, slow channel for integration over a large (100 µs) time constant.
Fast channels
In the fast channels the signal is read by three current conveyors with different input impedances in order to define the respective gain ratios. Current integration and current-to-voltage conversion 
Slow channel
The slow channel is designed to integrate low amplitude currents in the range from 0.5 nA to 1 µA with minimum contamination from 1 /f noise, induced by the input stage. A current conveyor with low input impedance is therefore used to drive approximately 87% of the PMT signal to the slow channel, while current integration and current-to-voltage conversion are carried out by a differential amplifier with large time-constant (100 µs) RC feedback. The integrated signal is then sampled by a 12-bit 833 kS/s ADC. Finally, to minimise the contamination from white noise in the measurement of such low amplitude currents, the data are obtained after averaging over a time interval of 10 ms, as described in Section 6.
Outputs
The 12-bit fast channel samples are read-out from 12 respective output pins. In order to comply with the bandwith of the uplink to the back-end electronics (see Section 6), FATALIC is restricted to provide the data of only two of the three channels; the 12-bit data of the medium-gain channel are always delivered upon the rising edge of the 40 MHz clock, while the data of either the lowor the high-gain channel (alternative gain) are delivered upon the falling edge. The selection of the alternative gain is made by the embedded digital block, depending on the saturation of the most sensitive channel (dynamic gain switch), and is declared by an output flag. The dynamic gain switch can however be forced to provide only low gain readout by a dedicated input bit. On the other hand, slow channel data are delivered serially, at 10 Mbps (833 kHz for the 12 bits), through a single readout pin.
-7 -5 Development of the integrated circuit
Current conveyor system
The current conveyor system, schematically depicted in Fig. 5 , consists of one input stage and four output stages, which provide differential signal to each of the FATALIC channels. At the input stage, a current mirror sets the biasing current to the nominal value of 0.5 mA. The PMT signal is read at the source of four common-gate NMOS3 with the same length (L) but with different widths: W fast /1 (high gain), W fast /8 (medium gain), W fast /64 (low gain) and W slow = 8 W fast (slow channel). A PMOS4 current mirror is then used to replicate the current from the drain of each NMOS onto the respective output stage, while a replica of the input stage (dummy) is implemented to return just the biasing current to be subtracted from the final output.
In terms of noise, the performance of such current-conveyor is modest compared to a charge preamplifier, but well adequate, given the large dynamic range of input stage. The input impedance is kept below 70 Ω for the entire input range, as presented in Fig. 6 . Finally, to account for the openloop configuration of the input stage, which induces large dispersion among the pedestal values, a tuning system with a Digital-to-Analog Converter (DAC) is also added to the above design. 
Signal shaping
The shaper ( Table 2 : Open-loop characteristics of the shaper with the ADC input capacitive load of 1.6 pF.
Signal digitisation
The shaper output is sampled by three 12-bit 40 MS/s ADCs in the fast channels and one 12-bit 833 kS/s ADC in the slow channel. The ADC design by LPC is based on the pipeline architecture with 1.5-bit-per-stage resolution. Among the various efficient ADC architectures developed and improved over the last ten years, the pipelined ADC has been adapted for high resolution, speed and dynamic range with relatively low power consumption and low component count. In CMOS technologies, resolutions in the range of 10-14 bits with a sampling frequency up to 100 MS/s are typically achieved with power consumption lower than 100 mW. Fig. 8 presents the block diagram of the pipelined ADC, with two output bits per stage, displaying the architecture of one of the stages. Each stage receives a differential input voltage, in the range ±500 mV, which is read by a sample-and-hold (S/H) and a 2-bit flash ADC. The flash ADC compares the input to two threshold voltages (±125 mV) and outputs a 2-bit word, while a DAC and a residue amplifier provide the input to the next stage, as presented in Table 3 . Although a 2-bit word is delivered, the effective resolution is 1.5-bit since the combination 11 is avoided. This bit redundancy limits the degradation of the Integral Non-Linearity5 (INL) due to variations of the residue amplifier gain or the comparator offsets (the INL is unaffected by variations of the offset voltage for up to ±12.5% of the full-scale input voltage). The complete architecture includes 12 cascading stages and is followed by the digital correction block (see Section 5.4), which delivers the final digital code. The estimated power consumption of the ADC is 48 mW. 
The comparator
The architecture of the comparator is presented in Fig. 9 . The transconductance input stage is fully differential, comparing the differential input signal to the differential threshold voltage, while isolating the input from kick-back noise, induced by the switching of the subsequent latched stage. The latched stage performs the comparison when the switch-transistors are OFF and a reset when they are ON. The state of the comparator, when it is latched, is memorised thanks to the bistable 5The INL is defined as the deviation, in Least-Significant-Bits (LSBs), of the output code from the ideal transferfunction.
-11 -third stage. Lastly, two NOT gates perform the final digital shaping of the output signal. The main characteristics of the comparator are given in Table 4 .
The DAC
The DAC employs a set of switches, controlled by the comparators, to select the differential reference voltage to be applied on the feedback capacitor of the residue amplifier. Given the ±500 mV input dynamic range, the reference voltages are set to ±250 mV.
Clocking frequency 40 MHz Sensitivity 50 µV Power supply 1.6 V Power consumption 2.1 mW Table 4 : Characteristics of the comparator.
The gain-2 residue amplifier
The residue amplifier, displayed in Fig. 8 , is a differential amplifier with capacitive feedback. Capacitive, rather than resistive feedback is used for better component matching, which is crucial for the accuracy of the amplification and, therefore, the linearity of the ADC. A capacitance of 800 fF is sufficient to minimise both the thermal noise (kT/C) and the component mismatch (∼1/ √ C). On the other hand, the design requires a small die surface and low supply current for dynamic performance. To match both the sampling (C S ) and feedback (C F ) capacitance to 800 fF, with an accuracy better than 0.1%, an array of four 1600 fF MIM capacitor unit cells is drawn in commoncentroid layout, while dummy switches are added to counterbalance parasitic capacitors introduced by the reset switches. Lastly, the timing sequence, controlling the sample, hold/amplification and reset phases of each stage, has been carefully defined to prevent charge disruption.
Digital block
Each ADC delivers twelve 2-bit words at the 40 MHz clock rate. The result of the complete conversion is therefore a 24-bit word with 12 redundant bits. The digital block (Fig. 10) synchronises the outputs using shift registers to compensate for the delay due to the position of each stage in the pipeline. The total latency of this process is equal to eight clock cycles (200 ns) and it is fixed for every power cycle. Digital summation (with carry-save) of the most-significant bit of each stage with the least-significant bit of the previous stage is then performed to the final 12-bit code every 25 ns for the fast channels and 1.2 µs) for the slow channel. The second function of the digital block is to select the alternative gain output between the high-gain and the low-gain channel data (see Section 4.3). This selection is made according to the output of the low-gain channel; if it is equal or higher (lower) than 600 ADC counts, then the low-gain (high-gain) channel data are delivered.
The Floorplan
The floorplan of FATALIC (Fig. 11) is optimized for minimum surface, while preserving signal integrity. Two main regions are distinguished; the region hosting the input stage and the shapers -12 - (analog block), and the region hosting the ADCs and the digital block. The two regions are isolated by a high impedance (BFMOAT) layer to reduce the coupling. The analog power and reference voltage rails are also decoupled by embedded large capacitors. The total surface of the chip measures 6.3 mm 2 , while the core area is limited to 2.3 mm 2 . 6 Associated cards Fig. 12 displays a fully equipped mini-drawer, hosting 12 PMTs, 12 "All-in-One" cards, each of which supports one FATALIC unit and the dedicated CIS, and the Mainboard, which controls the All-in-One cards and transmits the data to the Daughterboard. Both the All-in-One cards and the Mainboard have been designed by LPC for the purposes of FATALIC. The Daughterboard, also shown in Fig. 12 , is the on-detector interface to the back-end electronics. It is divided into two independent sides, each of which establishes a 9.6 Gb/s uplink to deliver the data of six readout channels to the off-detector PPr, while a 4.8 Gb/s downlink is used to transmit the LHC clock as well as control and configuration commands.
All-in-one card
The All-in-One card supports FATALIC and the CIS. It is based on a 6-layer Printed Circuit Board (PCB) with dimensions 7.0 cm × 4.7 cm. On one side, a 7-pin connector attaches the card to the high-voltage divider, at the basis of the PMT, while on the opposite side a 40-conductor ribbon cable establishes communication with the Mainboard. Finally, nine on-board potentiometers adjust the pedestal for each channel and the low voltages supplying FATALIC and other blocks. A schematic diagram of the CIS is given in Fig. 13 . The charge injection is driven by a 12-bit DAC with maximum output 4.095 Volts. The DAC charges one of the three available capacitors, 5.6 pF, 39 pF or 330 pF, for the scanning of the high-, medium-or low-gain dynamic range, respectively. The connected capacitance is defined through analog switches, controlled by two timing signals from the Mainboard. The same timing signals control the charge/discharge (through appropriately adjusted resistances) cycles in order to reproduce the PMT pulse shape. Finally, the DAC is also connected to a Howland DC current source to allow calibration of the slow channel.
Mainboard
The twelve All-in-One cards of one mini-drawer are controlled by the 28 cm × 10 cm Mainboard, which serialises the data for transmission to the Daughterboard and distributes clocks and commands from the Daughterboard to the FE electronics. It also distributes the 10 V low-voltage supply, by means of point-of-load regulators, to power both the All-in-One cards and the Daughterboard. The connection to the Daughterboard is established with a 400-pin FPGA Mezzanine Connector (FMC). 
Simulation and performance
The following paragraphs describe the performance of FATALIC in terms of noise, linearity and radiation tolerance, based on both simulation and experimental measurements with 24 prototype units. It is noted however that the current All-in-One cards, used to accomodate FATALIC for the purposes of these studies, are not adequate to support the slow channel. Specifically, due to the particularly large gain of this channel, the resistance range of the respective on-board potensiometer is not sufficient for the adjustment of the pedestal within the input dynamic range. As a workaround, it was decided to reduce the biasing current in the input stage, from the nominal value of 0.5 mA to 0.2 mA. This however affected the performance of the fast channels, causing dynamic amplification of the signal depending on its amplitude, with significant impact on the linearity. This effect would normally be eliminated by equipping the All-in-One cards with potentiometers of larger resistance range or by implementing slow control functionalities in FATALIC.
Noise measurements
The dominant noise introduced in fast channels is white noise from the input stage expected, from simulation results, to be approximately 8 fC in the high-gain channel. On the other hand, the dominant noise in the slow channel is 1 /f noise from the input stage, estimated at the level of 7 nA. This substantially exceeds the specification of 0.25 nA (see Table 1 ) initially defined for FATALIC because the lowest region of the noise frequency spectrum was not properly included in the simulations used to define the design. The spectral density of the noise for the fast and slow channels is given in Fig. 14. Experimental measurements of the noise were taken with the All-in-One cards connected to PMTs under high voltage, without signal. Fig. 15a, 15b present the mean and standard deviation of the pedestal distribution in the fast channels, obtained by fit of a gaussian. The noise, estimated from the standard deviation and using the fC/ADC conversion factors of Section 7.2, averages to (6.1 ± 0.9) fC in the high-gain channel, (26.5 ± 3.1) fC in the medium-gain channel and (260.0 ± 21.1) fC in the low-gain channel. Respective results for the case of the slow channel are presented in Fig. 15c, 15d . In this case the average noise is found (26.2 ± 0.7) counts which, considering the design ratio of 0.25 nA/count, corresponds to (6.6 ± 0.8) nA. Fig. 16a, 16b and 16c , present the deviation from linearity, obtained by simulation, of the analog pulse peak amplitude (in millivolt) as a function of the input charge. The non-linearity, defined as the deviation from a linear fit over the maximum channel response (1 V), does not exceed 0.3% in the input charge range up to 850 pC, while at higher charge values it increases to approximately 0.6% at 1.2 nC. In the case of the slow channel, the deviation from linearity is expected to be less than 0.2% over the entire input current range, as shown in Fig. 16d .
Linearity measurements
Experimental studies of the linearity are carried out using the CIS. Since the linearity of the CIS has not been verified, the response in this case is obtained from the sum of the selected digitised samples, which is less sensitive to the shape of the injected pulse. Fig. 17a, 17b and 17c present the measured response as a function of the injected charge for one unit. The maximum deviation from linearity (average from the tested prototype units, relative to the maximum response) in the high-gain channel is (0.3 ± 0.1)% above 2 pC, increasing to (1.3 ± 0.2)% for lower charge values. In the medium-gain channel the deviation is (1.4 ± 0.6)%, while in the low-gain channel it is (0.4 ± 0.1)%, below 800 pC, reaching (3.5 ± 0.7)% at 1.2 nC. The fC/ADC conversion factors, summarised in Fig. 17d , are finally obtained from the slope of the linear interpolation and average to (2.46 ± 0.03) fC/ADC, (20.4 ± 0.7) fC/ADC and (211.3 ± 6.4) fC/ADC, respectively.
Radiation Tolerance
The 130 nm GF CMOS technology is recommendated by ATLAS due to its high radiation tolerance in terms of Total Ionising Dose (TID). It is suitable for the development of radiation-hard chips, up to at least 100 Mrad with, however, a peak of leakage current at ∼1 Mrad. The expected radiation level, based on both Monte Carlo and in-situ measurements is 50 krad (including conservative safety factors), well below the 1 Mrad peak. Hence, no further assessment for TID or Non Ionising Energy Loss (NIEL) is deemed necessary for FATALIC (it is necessary though for the associated cards).
-18 - -21 -
Energy reconstruction
As described in Section 4, the fast channel shapers deliver an asymmetric pulse, the amplitude A of which has to be reconstructed from the set of digitised samples (typically using seven samples), collected after trigger decision. At the 40 MHz sampling rate, the second sample is expected to coincide with the pulse peak. Small time-shifts τ, of the order of a few nanoseconds, are however possible. Both A and τ are reconstructed by Optimal Filtering [16] , a weighted sum of the digitised samples with minimum sensitivity to both correlated (e.g. pile-up) and uncorrelated noise.
Selection of digitised samples
The dynamic gain switch of FATALIC is exploited in order to ensure the best possible resolution for the acquisition of each digitised sample S i . If the high-gain channel does not saturate, the ADC measurement is acquired from the alternative gain output. Otherwise, the switch turns to low gain, in which case the medium gain output is preferred. If, however, the medium-gain channel also saturates, then the alternative, low-gain measurement is used. It is noted that, once the high-gain channel saturates, the switch remains to low gain for the next seven samples (low-gain block). This is imposed through the Mainboard FPGAs to allow the high-gain channel to recover from the saturation state. Once the ADC value has been acquired, the pedestal p i of the selected channel is subtracted and the sample is normalised to the high-gain channel scale. Fig. 18 demonstrates a characteristic case in which digitised samples from different channels are selected. -22 -
Simulated effects
Simulation studies have been carried out in order to test the impact of different experimental effects on the energy resolution, defined as the standard deviation of the (E reco − E true )/E true distribution, where E refers to the Tile cell energy corresponding to one of two PMTs and the label true (reco) refers to the input (reconstructed) energy. In these studies, the Optimal Filter is calibrated with respect to the measured output pulse shape of a prototype unit, while, to simulate charge-injection, this reference pulse is given random amplitudes and is subsequently sampled and digitised, as described in Section 8.1. Random noise is then added to each digitised sample, based on measurements with the same prototype unit; 1.5 ADC counts for the low-and medium-gain channels (which correspond to 8.4 fC and 32 fC, respectively), and 3.5 ADC counts for the highgain channel (256 fC). As shown in Fig. 19 , the expected resolution is kept below 2% in the input range above 2 pC, while for lower values it increases up to ∼7%.
Electronic noise and gain saturation: The reference scenario described above is first compared to the ideal case where no electronic noise and/or no low-gain block is applied. The results are presented in Fig. 19a . The impact of noise on the intrinsic 18-bit resolution is about one order of magnitude, whereas the cost of the low gain block is less than 1% over the entire input range.
Phase variations:
The arrival time of the pulse, with respect to the digitising clock, may vary by a few nanoseconds. To simulate this effect, the generated pulses are shifted by a random phase τ ∈ [−8, 8] ns. The results (Fig. 19a) show that such time-shifts are accounted for by Optimal Filtering, affecting the resolution by ∼1%.
Gain variations:
The impact of gain variations is tested by shifting the gain of each channel by 5% or 10%. Such variations do not affect the resolution (Fig. 19b ) but rather introduce a shift to the reconstructed energy, which can be recovered by calibration using the CIS.
Pulse shape variations:
Imperfections of the electronics are found to distort the output pulse shape depending on the input charge. Fig. 19c compares simulated pulses obtained with different injected charges. These variations of the pulse shape have less than 0.5% impact on the resolution (Fig. 19d) . However, they introduce a shift to the reconstructed energy (of less than 1%), which can be accounted for by applying a scale factor to the gain as a function of the input charge.
Pile-up:
Inelastic pp interactions taking place in the same (in-time pile-up) and adjacent (outof-time pile-up) bunch crossings introduce parasitic pulses, which contaminate the signal of the actual hard scattering. In the years 2015-2017 the average number of inelastic interactions per bunch-crossing was measured µ = 32. In the HL-LHC the nominal expected rate is µ = 140, but it is foreseen to increase as much as µ ≈ 200. To test the performance of FATALIC in the presence of pile-up background, random in-time and out-of-time pile-up pulses are added to the signal according to the energy spectrum of minimum-bias events, obtained after full simulation of the detector for different values of µ . The impact on the resolution is prominent in the low energy range, as presented in Fig. 20a for the readout cell A13 (the most exposed to radiation from pp collisions) and cell D1 (the least exposed). It can however be reduced by calibrating the Optimal Filter against the correlation matrix of the pile-up background [16] . 
Two-gain scenario
In order to quantify the benefit of having three gains, the scenario of using two gains with a gain ratio of 32 is explored, based on the simulation described above. Considering digitisation with 12-bit ADCs, the effective output range in this case is 17-bits. The noise is assumed to be 1.8 ADC counts for both channels. As seen in Fig. 20b the resolution drops by ∼8% in the intermediate range (40 MeV-180 MeV) which, in the case of FATALIC, is recovered by the medium-gain channel.
-24 - compared to the scenario of using only two fast channels with a gain ratio of 32.
Performance with particle beams
The 24 prototype FATALIC were tested in the reconstruction of real energy deposits of hadrons, electrons and muons, provided by the H8 secondary particle beam of CERN. Two mini-drawers, equipped with 12 FATALIC FE electronics each, were inserted into a demonstrator Tile module, providing full readout of Tile cells A1-5, BC1-5, D1 and partial readout of Tile cells A6 and D0. First, the detection chains by running Cs scans, using the slow channel. The fast channels performance was then probed with particle beams of different energies and compositions.
Inter-calibration of readout channels
As the Cs source traverses a Tile cell, the response of the respective PMTs exhibits a characteristic plateau (Fig. 21a) , which reflects the sequential excitation of the tiles, with local maxima (minima), generated when the source traverses scintillator tiles (steel plates). Since the energy deposited by the source is uniform, variations of the measured responses can be used to inter-calibrate the detection chains. This is performed by equalising each plateau p i to the overall mean p . Since the dependence of the PMT gain on the applied high voltage is ∼V β , where β 7 for the particular PMTs, the high voltage is corrected to Fig. 21b demonstrates the plateaus before and after equalisation. As shown, residual variations are successfully reduced to the noise level and can be used to derive correction factors for each channel's response.
-25 - 
Measurement of particle deposits
The following paragraphs present the results of data analysis, using electron, muon and hadron beams, targeting the center of each A-cell at 20 • incidence. The deposited energy is reconstructed using Optimal Filtering, as described in Section 8, and is expressed in units of input charge by applying the fC/ADC conversion factors, derived using the CIS. Unless specified otherwise, the deposited energy is estimated from the sum of the measurements in the targeted A and BC cell. Adjacent cells are also taken into account for containment. The average energy, deposited by a specific beam constituent, is obtained from the mean of a gaussian line-shape interpolated around the respective characteristic peak.
Electrons
The reconstruction of 20 GeV, 50 GeV and 100 GeV electron signal is tested with the beam targeting Tile cells A2-A5. Since the electromagnetic shower is contained within a short distance in the Tile module, the electron energy is obtained from the respective distribution in each targeted Acell. The results are summarised in Table 5 and displayed graphically in Fig. 22 as a function of the beam energy. Using these measurements (from 11 Tile cells), the EM scale constant is estimated 1.04 ± 0.1pC/GeV, which is consistent with the nominal value of 1.05 ± 0.1pC/GeV, derived in precise test-beam studies [15] , based on more than 200 Tile Cells, using electrons of different energies with 20 • incidence. Finally, an example showing the total reconstructed energy distribution (A and BC cells combined) is presented in Fig. 23a , while Fig. 23b displays the twodimensional distribution in the A/BC cell plane, where the characteristic deposits of the different beam constituents (electrons, muons and pions) can be distinguished.
-26 - 
Beam
Cell A2 Cell A3 Cell A4 Cell A5 Table 5 : Reconstructed electron energy in the targeted A-cells.
Muons
The reconstruction of muon signal is tested with 165 GeV muon beams targeting cells A2-A5. The reconstructed energy distribution for a characteristic case is shown in Fig. 23c . The most probable values (mpv), estimated for each targeted Tile tower (A-and BC-cells), are listed in Table 6 . The results are also expressed in terms of energy loss per unit distance (dE/dx), using the track-length in each Tile cell (31.925 cm for A-cells and 89.391 cm for BC-cells, for 20 • incidence). The overal energy loss is 14.2 ± 1.9 fC/cm, which corresponds to 15.0 ± 2.0 MeV/cm considering the EM calibration constant of 1.04 MeV/pC estimated above and the e/µ response ratio of 0.91. The result is consistent with the estimate of 15.2 MeV/cm, reported in previous test-beam studies using 180 GeV muons at projective angles.
Hadrons
Hadron signal is studied with beams of 30 GeV and 180 GeV pions targeting Tile cell A3. To account for energy leakage towards adjacent Tile cells of the same module, deposits measured in -27 - Tile cells A2, BC2, A4, BC4 are also added to the measurement. The results are summarised in Table 7 , while the reconstructed energy distribution for the case of 30 GeV pions is shown in Fig. 23d . As expected, the measured energy is lower than the beam energy, since a single Tile module cannot provide full coverage, in solid angle, of the hadronic shower. The energy leakage towards neighbouring Tile cells of the same module is found approximately 11% and 18% in the -28 - 
Conclusion
The above sections conclude the presentation of FATALIC and demonstrate the full potential of the 130 nm CMOS technology, realised as a FE readout electronic architecture, for the strenuous conditions of the HL-LHC. The three fast channels offer excellent performance for the processing of detector signal in the input charge up to 1.2 nC, with a noise level of approximately 8 fC and linearity better than 0.3%, up to approximately 850 pC, according to simulation. Their performance was also probed with test-beam studies, in which FATALIC was used to measure the energy of electrons, muons and pions, reproducing the nominal estimation of the Tile EM scale constant as well as the average energy loss per unit distance of muons traversing the TileCal. The slow channel also exhibits the expected performance and was successfully used to process low amplitude currents in calibration scans using the Cs system, in order to correct the PMT gains in the respective readout channels. At the same time, however, it exposes the limitations imposed by the CMOS technology, along with possible directions for improvement. The main limitation is the large, approximately 7 nA 1 /f noise, introduced by the input stage, which does not comply with the specification (<1 nA) defined for new the Tile FE electronics. The adopted, current-driven architecture is not offered for further reduction of the noise, which would therefore require migration to a different (bi-CMOS) technology or relocation of the slow channel outside FATALIC.
-29 -
