Abstract-Medium to large channel count detectors are usually faced with a few unattractive options for data acquisition (DAQ). Small to medium-sized TPC experiments, for example, are too small to justify the expense and development time of application specific integrated circuits (ASIC). Commercial rack mounted electronics are too bulky and expensive for large channel counts. The combination of commercial high-speed high-density FPGAs, ADCs, and small discrete components provides another option that scales to tens of thousands of channels and is only slightly larger than ASICs using off-the-shelf components. A working example of this alternative solution is presented.
I. INTRODUCTION

M
OST time projection chamber (TPC) projects require a significant effort devoted to the data acquisition (DAQ) system because of the constraints on size, the large number of channels, and large data throughput. Large to medium-sized TPCs (e.g., Alice [1] , [2] , STAR [3] , EOS [4] , NA49 [5] ) have typically used custom application-specific integrated circuits (ASICs) at the front end and elaborate custom-built readout schemes to compress and record the enormous data rate. Some examples of successful ASICs for TPCs are as follows: the ALTRO chip for the Alice TPC [6] and the chips for the STAR TPC [3] , which have worked well for these large experiments with hundreds of thousands of channels.
On the other extreme are small prototype TPCs that have multiplexed channels numbering in the hundreds (e.g., [7] ). In this case, the TPC can reasonably be connected by many cables to rack mount equipment commercially available. The focus of this work is electronics for TPCs (or any medium channel count detector) that fall below the scale of enormous but are too big for the commercially available equipment; TPCs in the thousands to tens of thousands of channels. There are several reasons to select an ASIC over an off-the-shelf solution beyond the high channel density packing, such as specific radiation hardness, but for most environments, off-the-shelf components are perfectly adequate as demonstrated with the design presented herein. Although this system was designed in the context of a TPC, the same arguments apply for any medium channel count charge collecting detector. In addition, the discrete design of the preamp makes it easy and very cost effective to change the design to match the characteristics of a different detector. The significant improvement over the last few decades in three commercial products have collectively provided an alternative to meeting high channel count and space requirements. The first important advance is the reduction in size of all component packages such that the preamp can be fabricated with discrete components in a very small area. The second advance is the availability of high-speed high-density analog to digital converters (ADCs) that are available from Analog Devices (AD9277) and Texas Instruments (ADS5270 series). These small (Plastic Quad Flat Pack package, 13.2 mm by 13.2 mm) chips contain up to eight ADCs on a chip and can run at sampling speeds in excess of 70 MS/s. The third and perhaps most significant development is the availability of high-speed high-density field-programmable gate arrays (FPGA) that can control, read out, process, cache, and transmit the data from a number of ADCs in parallel.
A system exploiting these advances has been constructed for the Neutron Induced Fission Fragment Tracking Experiment (NIFFTE) project [8] . This is a 5952-channel 75-m gap MicroMegas TPC with 2-mm hexagon pixels designed to study neutron induced fission. The particles tracked are protons to heavy fission fragments requiring a large dynamic range of at least a few thousand. The 15-cm-diameter TPC has a drift distance of only 5.4 cm or about 1 s requiring sampling speeds around 50 MHz to properly digitize the tracks. Keeping the electronics close to the pads limits the space for the electronics to only 40-50 l. This system, the design choices, and performance are described below.
II. SYSTEM OVERVIEW
The requirements for the NIFFTE project TPC are similar to many TPCs, but some of the specifics to this application are in Table I . Fig. 1 shows a high-level sketch of the electronics, Fig. 2 is a picture of the completed boards, and Fig. 3 shows 92 EtherDAQ boards loaded on a test TPC. The system was split into two boards, one for the analog amplifiers (called the preamp) and the other containing the ADCs and all of the digital electronics (called the EtherDAQ). This partition helped with dividing the development effort and also provides flexibility in that the analog amplifiers are easily exchanged without needing to replace the more expensive digital components. Both boards U.S. Government work not protected by U.S. copyright. Fig. 1 . Mechanical layout of the preamp and EtherDAQ board mated together and to a TPC pad plane. Only the major components on the top side are shown; power supplies and other smaller components are suppressed for clarity. The preamp has 16 additional amplifiers on the back side, and the EtherDAQ has two ADC chips and memory on the backside. are built on standard 1.57-mm-thick FR4 PCBs and the dimensions of the six-layer preamp are 49 mm by 70.5 mm and the 12-layer EtherDAQ is 49 mm by 112 mm, each about the size of a business card.
Each board has two multi-pin connectors, and the EtherDAQ has a Small form Factor Pluggable (SFP) module connector, which is typically used with multimode optical fibers using a 770 to 860 nm, near-infrared (NIR) wavelength for standard 1000BASE-SX, 1.25-GB/s Ethernet communications. It is over this connection that the high rate data from the ADCs is sent to the event builders. The top left connector on the EtherDAQ connects to a digital bus from which the 24 V power, clock, trigger, and other digital signals are received and transmitted. In the TPC application all of the ADC clocks must be synchronized and the delivery of that clock is the primary design reason for this connector. The middle connector connects the balanced pair preamp output to the input of the ADCs on the EtherDAQ board, and the bottom left connector transmits the charge from pads on the pad plane to the input of the charge amplifiers.
In keeping with the design goal of using off-the-shelf components, the readout uses standard Gigabit Ethernet with one fiber going to each card. This provides tremendous flexibility in configuring the readout of the data, requires only commercial components, and provides electrical isolation. The topology of the network is easily tailored for a specific application and the event builders/data collectors are placed in the network as needed to optimize the data collection. Once the data leaves the EtherDAQ card, there is no special readout boards needed, just computers and switches with standard Ethernet ports.
Optical fiber was selected for this application over CAT 5/6 for ease of handling the 192 connections in the small space. The fiber is lighter weight and less stiff than CAT5/6, and the cost difference is not that significant with respect to the budget of the overall project. The 24-port Dell 6224F switch was selected as it can be stacked (electrically) and managed easily as one unit, which is useful when managing 192 ports. The switch costs about $75 per port, and the SFP modules (FTLF8519P2BCL) are about $35 from Finisar. The cost incurred from using optical fibers can be reduced by moving to a CAT5/6 cabling solution. SFP modules supporting RJ45 are available and can be plugged interchangeably into the EtherDAQ board. The D-Link DGS-712 has been tested successfully with the board.
The power, clock, and triggers for the EtherDAQ boards are generated in the custom chassis called the power and clock distribution unit (PCDU). This unit also provides galvanic isolation of all signals and maintains the isolation provided by the power supplies and the fiber data link. The clock and triggers enter one point in this chassis and are fanned out and converted to LVDS for transmission to the cards. The power for both the EtherDAQ and preamps is provided by Agilent power supplies (N8755A, 6651A), which are also connected to the PCDU. The PCDU fans out the power, controls the power supplies to the correct voltages, and can shut down the system in the case of any failure.
III. PREAMP BOARD
A design decision was made to keep the analog section simple, which was enabled by the relatively fast ADCs and the powerful FPGAs that can do the signal shaping digitally. The schematic of one channel is shown in Fig. 4 ; it is a charge-sensitive, JFET cascode front end with a high slew rate op amp, followed by a differential buffer needed to drive the ADCs. This simple design is discussed elsewhere [9] and does not have any explicit shaping elements and has passive, resistive reset. In addition, the preamp does not have gain adjustment (other than selecting, during construction, the feedback capacitor on the transimpedance section, and/or the feedback resistors in the differential driver) and gain is set by adjusting the TPC bias voltage on the MicroMegas, which changes the gas gain within the TPC. The advantage of this design is the size, cost, and speed of development; a single channel measures only 38 mm by 4 mm on one side of a PCB (channels are placed back to back on both sides). A possible issue for other TPC designs is that one needs to have enough range in the gas gain of the TPC to accommodate any gain changes required.
The preamp development was short, only taking a few person-months to design, layout and test. The discrete component design allowed for rapid debugging and only a few PCB runs to get the final version. The production cost were also reasonable. A run of 200 cards (each with 32 amplifiers) only costs about $5 a channel, including the PCB, components, and assembly.
The preamp has two connectors. The input is an 80-pin 0.5-mm pitch, part number 80PS-JMDSS-G-1-TF from JST. Preamp inputs are connected to the pads receiving electrons in the TPC and location bits are formed by selective shorting of dedicated pins to ground to allow the cards to automatically have the information about what pads they are connected to. The output connector connects directly to the EtherDAQ card. This is a 140-pin, 0.5-mm pitch connector made by Hirose, part number is FX11LB-140P-SV. The first eight pins are simply the pass through of the location bits from the input connector. The outputs are all differential pairs and the common mode voltage for each pair is set by the ADCs. No power is transmitted between the preamp board and the digital board.
IV. DIGITAL BOARD (ETHERDAQ)
The digital board takes the analog signals from the preamps, processes them all the way to an Ethernet packet that is transmitted from an on-board SFP module. This ambitious design has a number of advantages, including a small footprint using off-the-shelf components, and the output is a standard raw Ethernet packet that is transported on cheap commercial equipment limiting the custom hardware needed. The total cost to purchase parts, produce the PCB and load the EtherDAQ cards was about $60 a channel in quantities of a few hundred boards.
A block diagram of the system is shown in Fig. 5 . Each analog preamp signal is connected through a passive filter to a channel of one of four 12-bits Texas Instruments ADS5272 chips that each contain eight independent ADCs Each channel output in the ADC is serialized and transmitted using dual data rate LVDS outputs, where the serial clock is six times the sampling frequency. The Virtex 5 (XC5VLX110T-1FFG1136C) FPGA digitally shapes the signal with a semi-Gaussian and differentiation filter and uses the shaped signal and a threshold to determine a trigger for each channel. Every triggered channel is captured (selectable length waveform or compressed time/energy) and stored on an internal FIFO buffer. The channel data is monitored by a state machine and transferred at a clock speed of 125 MHz to 128 MB of mobile SDRAM (two Micron MT48H16M32LF, a high-speed CMOS, internally configured as a quad-bank DRAM with a 32 bit synchronous interface) connected to the FPGA. This provides the necessary buffer for the Ethernet latency. Another state machine in the FPGA reads the SDRAM memory and sends data packets to the SFP module as they are ready.
A 25-MHz master clock is generated by a SRS CG635 clock generator and is connected to the PCDU that fans out the clock with a low skew 1-to-16 fan-out buffer (ICS83115) and then to LVDS drivers connected to twisted pair. These synchronized clocks on twisted pair are connected to the digital connector of the EtherDAQ and ensure synchronization of all the cards at 25
MHz. An on-board low-jitter phase-locked-loop (ICS844021) is used to recover the clock and generate the Gigabit Ethernet quality 125 MHz needed by the FPGA. The FPGA takes care of dividing the clock internally to reach the desired sampling rate on the ADCs. Four sampling rates are available through Ethernet programming: 62.5 , 50, 41.67, and 31.25 MHz.
A. Power Supply
The power consumed by one EtherDAQ board is about 12 W, 375-mW/channel, and includes the full processing of data at 62.5 MS/s from analog to Ethernet. The power consumption breaks down into about 1 W for each of the four ADC chips, about 1 W for the Ethernet SFP module, about 6 W for the FPGA and the remaining watts in power supplies and auxiliary circuits. The multiple low-voltage, high-current requirements of the integrated circuits necessitate a high-voltage power distribution to avoid excessive distribution currents, and corresponding losses. This is accomplished with a 24-V distribution using on board Linear Technology LT3501 switching regulators. The switching regulator was selected to efficiently make the large voltage drop required but careful attention to layout and manufacture recommendations was required to keep the noise levels low. The ADC noise levels were below manufacturers specifications, but preamp channels nearest the regulators did see higher noise values, although well within acceptable levels. The inductors inherent in this kind of design are not a problem for the NIFFTE TPC but would limit the use of the card if a strong magnetic field is required for the TPC.
B. ADC
Due to the high frequency of the serial data from the ADC (up to 390 MHz) and high density of channels, the routing delays to the FPGA and within the FPGA are difficult to predict accurately and vary across the various ADC devices. Each ADC device is routed to a unique bank in order to at least maintain timing delay consistency within channels of a same ADC. The ADC provides a single serial clock and a single frame clock for all eight channels. Each data channel is dual data rate and is expected to be aligned with the serial clock for data recovery. However, due to routing delays both on the PCB and inside the FPGA, each channel serial data comes at a slightly different time as shown on Fig. 6 . It is imperative to adjust the phase of the single ADC serial clock feeding the dual edge registers in the FPGA receiving fabric to ensure the setup and hold time is satisfied for all channel serial data.
FPGA Verilog code was devised to perform a one-time selfadjust of the phase at boot-up to find the optimal relation between serial clock and serial data using the programmable digital phase adjustment feature of the Virtex 5 Digital Clock Manager (DCM) units as shown in the flowchart in Fig. 7 . The ADC is programmed to output a test pattern which is carefully crafted and chosen to be 010101110011b that allows detecting setup and hold violation on the dual data rate input flip-flop. The phase of the serial clock is swept, and the phase values interval where the pattern is properly retrieved is detected. The center of the valid interval is then chosen as the optimal clock phase for sampling operation. The interval detection is smoothed by enforcing success of 256 back-to-back attempts at detecting the pattern. If no phase values are found that would successfully allow retrieval of the pattern, the system reports an error through the housekeeping packet. The Xilinx Virtex 5 FPGA allows a thermally and voltage compensated phase adjustment granularity of 1/256 the clock period at frequencies of interest.
C. Timing and Triggering
The EtherDAQ has a digital bus connector that is used to distribute the clock to many cards to keep them in phase and also provides fast signals including a global trigger, neighbor triggers, busy (trigger hold off), and trigger out.
There are four types of triggers, all of which will cause the FPGA to record data from the ADC and store it in internal memory and later transport over the Ethernet link. The first is a software trigger that is caused by a special packet sent over the Ethernet that causes a trigger. This is used to get a baseline waveform to measure, for example, the channel noise. The software trigger is also useful for diagnostics to see if a channel is responding correctly. The next trigger is a simple global hardware trigger that is delivered as a 3.3 V TTL signal on the digital bus connector. This is used to trigger the whole TPC in the case one has an external trigger source. We use this again mostly for diagnostics to make sure that all channels trigger together and the time counters are correct. A related trigger is the neighbor trigger that allows one to trigger channels on one card based on the proximity of channels on a contiguous card.
The idea behind this trigger is to record channels that might not self-trigger but might have subthreshold information of interest. It was determined for the first experimental runs that the noise levels are low enough, and this feature is not needed. Testing of this trigger shows that it operates as designed and can be used in later experimental runs. The last trigger is the most used and is a channel-by-channel self-trigger formed from the shaped signal, which is discussed below. The self-trigger creates zero suppressed data by testing if each channel is individually above threshold.
Additional fast signals are also available. A trigger out signal is generated for any internal trigger formed. This is not currently used but provides a fast signal for synchronizing other fast electronics with the TPC triggers. The busy or trigger hold-off is a TTL input that inhibits all triggers except the software trigger. This is used to reject data that occurs when the beam is off. The beam-off data is significant in rate (due to radioactive sample inside) so rejecting the useless data early makes data processing more efficient.
The combination of a relatively fast ADC and large FPGA provides significant power to process the digitized preamp signal. The signal is shaped in the FPGA, removing the need to do so in the analog electronics. The shaping selected was a semi-Gaussian filter, but it could be modified to meet the requirements of a different experiment by simply reprogramming the FPGA. In the current version of the firmware, the filtered signal is used to form the trigger but is not recorded since it is easily reproduced from the raw signal that is recorded.
The filtering is done in six steps starting with a four-sample digital differentiation to remove preamplifier baseline offset. The filter equation for the differentiation is . We approximate semi-Gaussian filtering by cascading five identical 3-point polynomial filters of equation:
. At each stage, the result is truncated to reduce the size of the calculation. The advantage of this over a generic FIR filter is that no multiplication is required other than a factor of two, and this can be accomplished with a shift, eliminating the need for a multiplier. After the filtering, a simple discriminator is used to determine if a given channel should trigger.
D. Raw Waveform and Compression Modes
The FPGA can record a waveform of a triggered channel up to 120 samples before and after the trigger. The number of samples is programmable through the Ethernet interface. In high-rate environments, the bandwidth of the Ethernet and the speed and size of the hard disk in the receiving computer limits the rate that signal can be recorded on average. For this reason, the FPGA can also extract the essential information about a signal in the form of amplitude and time. The amplitude is calculated by averaging the samples before the trigger and after the trigger, reporting just these two numbers with the trigger time. This method can fail if there is pile-up, and Fig. 8 shows how this is handled.
It is possible for a number of reasons that the internal memory buffers become full and triggers cannot be recorded (e.g., excessive Ethernet traffic that takes long enough to clear that the Fig. 8 . How the FPGA handles trigger pile-up in the compressed mode. The figure shows two triggers that occur close in time. In a normal event, there are samples, S, before (horizontal hatch) and after (vertical hatch) a trigger that are each averaged, to determine the amplitude of the pulse and the region around the rise (deadtime, in cross hatch) is excluded. In the case shown above, the second trigger interrupts the samples that are averaged after the first trigger. The algorithm recognizes this and only uses the samples outside of the dead zones.
buffers fill). In this event, the FPGA records the time that triggers are no longer accepted and the time that they are again accepted. These two times are placed in a data packet and sent in the normal data flow so the live time of the system can be calculated.
E. Memory Management
Most of the data is buffered in a large SDRAM before being sent over the network interface. The 128-MB SDRAM is at the heart of the overall design and is segmented in 1024 bytes long blocks. Those blocks are a direct mapping with the content of the Ethernet packet data. Event data are not necessarily aligned with block boundaries, and this has to be sorted out on the receiving computer.
The SDRAM arrangement (two 32-bit-wide SDRAMs in parallel) allows reading and writing blocks of 8 8 bytes. The Ethernet readout is limited to 1 byte transfer in comparison to the 8 bytes of the memory. This indicates that we can maintain the readout FIFO full with data by spending only 12.5% of the time on the readout. In order to accommodate auto-refresh, we have set the multiplex FIFO to write 3 RAM bursts in a row before attempting to read data, which sets the fraction to 25% for the Ethernet, which is higher than needed.
Also important in the time sharing of the memory is the autorefresh that occurs every 15 read slots. We therefore have about a 53% access bandwidth out of the 1 GB/s interface (burst of 8 transferred per 15 clock cycles). This roughly provides storage at about 399.7 MB/s with the 75% write allocation. It also provides a 124.4 MB/s readout with the 25% read allocation and 14/15 read sharing with auto-refresh. The storage bandwidth is slightly under the maximum real input bandwidth of 500 MB/s. The readout bandwidth is slightly under the maximum real output bandwidth of 125 MB/s.
F. Control Interface and Data Readout
All data and control (with the exception of fast signals) of the EtherDAQ is accomplished over the Gigabit Ethernet interface. The board is fitted with a SFP socket and configured to operate with 1000BASE-X. The MAC used in the design is implemented in hardware internal to the Virtex 5 FPGA, and the physical layer is implemented in the RocketIO GTP transceiver also on the FPGA chip. In order to keep the design small and efficient, no embedded CPU is implemented. All of the state machines are written in verilog, and the Ethernet packets are generated and received with state machines.
The last byte of the MAC address for each card is set by the location code passed through the preamp. This allows the location of a collection of cards to be easily identified while also separating the MAC addresses of each card.
The card operates strictly as a raw Ethernet device and does not implement TCP/IP as that is not needed for this application and would be resource intensive without a CPU. One feature of the normal TCP/IP stack that is needed is the tracking of packets to make sure they arrive since Ethernet does not guarantee lossless transport. This is implemented with a simple acknowledgment packet from the receiving computer for each packet sent by the card. The EtherDAQ state machine responsible for sending packets loops over the data and sends each packet in turn (not waiting for an acknowledgment) and continues to do this. If no acknowledgment is received, the card simply keeps sending all of the data not acknowledged over and over. As acknowledgments are received, the data is marked sent in the EtherDAQ memory, the data is no longer sent, and that space is used by another state machine that fills up the memory with data from the ADCs. Once filled, it is marked as such, and the sending state machine includes this packet in the sending process once it is reached in the loop.
The computer receiving the data checks that the data is valid and sends an acknowledgment. In the event that the computer receives identical packets, it simply acknowledges the second packet as well so the EtherDAQ knows that it was received, but the computer discards it. The packets are sequentially numbered within the EtherDAQ so the computer simply has to put the packets in numerical order to reconstruct the time order of the data. In the current operation, one consumer workstation computer running a normal (not real time) Linux operating system can operate 92 cards at 30 MB/s without problems. At higher rates, the hard drive starts to limit the operation since the computer spends some time accessing the disk to transfer data for analysis and other system operations. Increasing the disk speed with a solid state drive, RAID or otherwise, should provide much higher bandwidth if needed. Additional computers added to the system will also provide a higher throughput, but that has not been tested yet.
V. PERFORMANCE
Overall, the system has performed well both in bench tests and medium scale tests with almost 100 cards ( 3000 channels) running together. The power consumption is about 3.4 and 12 W for the preamp and digital board, respectively, which corresponds to 481 mW/channel in total. This power density is low enough that forced air cooling is sufficient and rather easily accomplished with simple off-the-shelf 2-in dc fans.
The preamp alone [9] was measured to have a noise of 70 V rms over a 175-MHz bandwidth, providing a dynamic range of over 25 000, or 89 dB, which corresponds to a range in charge of about 0.7 fC to 2.8 pC. In the system, the preamp has a rise time of 10-20 ns at the full capacitive load of the chamber's electrode. The rms system noise is shown in Fig. 9 for each of the 32 channels of a typical card with the preamps attached and inputs floating. The distribution is not completely random and is caused by pickup from the digital board back into the preamps. Given that the overall level is small, 0.9 to 1.75 ADC counts out of 4096, no effort was made to improve the pick-up, which is probably from the switching power supplies on the digital board. With additional effort and shielding, it is possible to reduce this further. The nonlinearity of the system is acceptable and was measured by injecting charge into the preamp + EtherDAQ system for a range of signal sizes; see Table II . The charge was generated with a HP 8012A pulse generator and a 1-pF capacitor. The crosstalk between individual channels was measured in a similar manner by injecting charge into one channel, but looking at the response of the neighboring channels. The system cross talk is dominated by the preamps and is only significant for preamps that are physically adjacent and on the same side of the printed circuit board. The crosstalk for a midscale pulse is less than 0.25% on the adjacent channels, which is well within the requirement for the TPC, and since the effect does not change in time, it is removed in postprocessing of the waveform offline.
The Gigabit Ethernet connection on each card provides a large data path that can easily handle the data generated on the card. Tests using both large radioactive (random) sources and bench top work with a pulser show that the system works well up to the maximum Gigabit Ethernet bandwidth ( 120 MB/s) and caches bursts far in excess of that rate. It also gracefully reports busy for sustained rates beyond the connection limit and recovers completely. At these high data rates, and especially for multiple cards, the limiting factor tends to be the rate that a standard hard disk can record data. For this reason, one has to carefully consider the computer system recording the data near the maximum bandwidth of this DAQ system.
The system gain was measured with charge injection from a 5.6-pF capacitor and pulser and the result was 4440 electrons/ADC count. This low system gain was selected for the application as it has rather large signals. The dynamic range was measured to be about 4000 or about 72 dB for the complete system.
VI. CONCLUSION
Technological advancements have now made it possible to build a compact high channel count DAQ from off-the-shelf components and such a system is demonstrated in this paper. The flexibility to reprogram the chips and the ability to purchase components commercially make this a great option for mediumscale experiments at the few thousand channel level. The example demonstrated in this paper was built for the NIFFTE TPC and has to date been operated with almost 3000 channels. The power levels are acceptable with about 481 mW per channel EtherDAQ preamp so air cooling is sufficient. The noise levels ( 4400 electrons) are sufficient for the experiment, and the large dynamic range 4000 , which is more important, was met with significant margin. The data rates achieved using standard Ethernet have so far been limited only by the speed of one hard disk on one computer, and the expansion of the number of readout computers and/or hard disks could increase the bandwidth considerably. Although an ASIC may be required by some applications and can be the most compact and sophisticated solution, the significant advantage of this approach is that a medium-scale experiment can have high-performance electronics that fit in a small space without investing in the infrastructure and time required to design and build a custom ASIC.
