Abstract--A high throughput Pulse Height Analyzer system for synchrotron-based applications requiring high resolution, high processing speed and low dead time has been developed. The system is comprised of a 120ns 12-bit nuclear quality Analog to Digital Converter with a self-adaptive fast peak detectorstretcher and a custom-made fast histogramming memory module that records and processes the digitized data. The histogramming module is packaged in a VME/VXI compatible interface. Data is transferred through a fast optical link from the memory interface to a computer. A dedicated data acquisition program matches the hardware characteristics of the histogramming memory module. The data acquisition system allows for different data collection modes. The system can acquire data in synchronization with an external trigger or operate as a standard Pulse Height Analyzer. Acquisition can be performed on several channels simultaneously. A two-channel prototype has been demonstrated at the Stanford Synchrotron Radiation Laboratory accelerator in conjunction with a fluorescence X-ray Absorption Fine Structure Spectroscopy experiment. A detailed description of the entire system is given and experimental data shown.
I. INTRODUCTION
N the framework of fluorescence detection techniques, the new third generation synchrotron radiation sources have photon fluxes giving signal intensities that cannot be fully exploited by existing detector systems. The XAFS (X-ray Absorption Fine Structure) program of the Actinide Chemistry Group at Lawrence Berkeley National Laboratory (LBNL) has recognized the necessity of having access to an X-ray fluorescence detector system that could overcome the limitations of commercially available systems. The Measurement Science Group (MSG) at LBNL has addressed the need for new and advanced X-ray fluorescence detector systems by building an innovative multi-pixel monolithic germanium detector and the associated pulse processing electronics.
The fluorescence detector is built into a 5cm × 6cm high purity germanium crystal. A layer of amorphous silicon is grown on one side to constitute the P+ contact. On the other side of the crystal, four N+ contacts are created by Lithium diffusion. The area of each N+ contact is 0.25 mm 2 . This way, a four-element detector is obtained.
Four fast low noise charge sensitive preamplifiers, using a custom 4-terminal Junction Field Effect Transistor (JFET) as an input device, collect the charge generated by the radiation from each detector element. The JFETs are at ~130K and mounted in close proximity to the detector. Each preamplifier channel feeds a four-pole semi-gaussian shaper. The shapers have a user-selectable peaking time of either 250ns or 500ns in addition to providing a lower resolution fast channel, for counting all incoming events, hence measuring the Incoming Count Rate (ICR).
The fast Pulse Height Analyzer (PHA) then completes the pulse processing. The PHA system is comprised of a fast high resolution Analog to Digital Converter (ADC), a fast histogramming memory and a VME/VXI compatible interface. A prototype system was built using a VME interface. This system consists of one complete data acquisition channel and has been fully characterized. The system was then duplicated and expanded into a VXI system to address the problem of simultaneous data acquisition from multiple channels. Both systems were tested at the SSRL in conjunction with a fluorescence X-ray Absorption Fine Structure Spectroscopy experiment. The block diagram of a single acquisition channel is shown in Fig. 2 . The complete 12-channel instrument is under development and the basic building block module shown in Fig. 1 The analog input section is built around the fast selfadaptive peak detector-stretcher described in [1] . The peak detector-stretcher includes all the necessary logic to provide pileup rejection and gating functions. The stretched signal is fed to a commercial ADC chosen to maximize throughput without compromising the differential non-linearity (DNL). This ADC is an 8Msps unit, providing 14 bits of resolution with no missing code over the full temperature range, and with an intrinsic DNL of ½ LSB. To improve the DNL (not adequate for nuclear spectroscopy applications), only the first 12 most significant bits are used. A 6-bit sliding scale correction [3] lowers the DNL to less than 1%. Including delays, the conversion time of the ADC is 120ns. The electronic pulse-processing amplifier, in its fastest version, shapes the energy events coming from the detector with a 250ns peaking time fourth-order, pseudo-gaussian shaping function. Particular care was taken at this stage to make sure that the total dead time of the system (i.e., the time during which the system is busy and cannot process further events) [4] resides only in the signal shape used. In fact, for high rate applications, it is important to know exactly where the dead time is generated and how easily it can be accounted for. In our case (see Fig. 2 ), a pulse can be characterized by its peaking time T P , its peak-to-baseline time T B and its amplitude. The event processing time of the system is T D . In the ideal case of T D =0, if another event B occurs T after a pulse A is generated, then the new event B will be considered a valid event if, when its peak is reached, the contribution of the event A to the baseline has reached a negligible value. In our case this is true if T ≥ (T P +(T B -T P ))=T B . In all other cases the pulse B is contaminated by the pulse A and the two are said to be in a "pileup" condition and the pulse B is rejected. This situation, in the ideal case, is not dependent on the amplitude of the pulses but only on T P and T B . When T D ≠ 0 but T D >T B , the situation is such that all pulses occurring during a time T D after A are rejected. It should also be noted that the time T D is often amplitude-dependent. If T D is made amplitude independent and less than T B , then no pulses can be processed during T B . This condition is analogous to the pileup condition, although conceptually different. In our system, we made a particular effort to ensure that the processing time fell exactly under this case. As a result, the dead time is now represented by the pileup condition and can be easily accounted for using a paralyzable dead time model [4] .
The next stage of the event-processing channel is the fast histogramming memory module, tailored to the requirements of the ADC and its sliding scale electronics.
III. HISTOGRAMMING MEMORY MODULE
The histogramming memory is built around a 4Kx18 dualport static random access memory (SRAM). One port is dedicated to data accumulation while the second port is mapped to the VME/VXI address and data buses for the data transfer to the computer (see Fig. 3 ). After an analog to digital conversion has taken place, the digitizer board control logic sends a request to the histogramming memory module. When the histogramming memory controller acknowledges the request, it uses the ADC data to increment by one the content of the corresponding address memory location. The control logic for the histogramming memory is built into an Electrically Programmable Logic Device (EPLD) array, taking advantage of both the speed and the compactness of these devices. The incoming events are random in nature as is the request generated by the digitizer module. That request is synchronized at the level of the histogramming module. A built-in state machine produces the necessary signals to control and perform the "Read", "Add One", and "Write" operations required to build the histogram. The access time of the memory we selected is 15ns. Assuming that the operations were fully synchronous, the maximum theoretical frequency of operation would be 66MHz. However, due to systems considerations such as propagation delays between the different subsections of the design, and the random nature of the events, we opted for a 30MHz timing crystal. The entire operation lasts 130ns, which is less than the dead time introduced by the pulse-shaping amplifier. Therefore, the Analog to Digital conversion and the histogramming operation does not increase the overall eventby-event processing time of the entire system. Data collection is either gated by an external synchronization signal or is directly enabled from the computer. The external trigger can be an accelerator beam clock or a precise gate signal to time-stamp the data collection and measure count rates. The trigger circuit generates an "Enable" signal for the histogramming operation to take place.
The histogramming memory module is packaged in a VME/VXI interface board. The VME-based system is a single channel PHA whereas the VXI-based system accommodates four channels per board. In both systems, the interface operations are performed on the second port of the SRAM. In the VME-based system the "Read" and "Erase" (i.e., writing zeros in all the memory locations) operations are performed simultaneously with the data transfer. Thus, while a memory location is addressed its content is read out and then erased during the same cycle. In the VXI-based system all four channels work simultaneously and share the trigger logic and the necessary hardware to perform the "Read" and "Erase" operations.
In the VXI-based system, the erasing of the memory takes place separately from the readout and simultaneously on all channels. In addition to the histogram data, a 32-bit counter records the duration of the acquisition. Four counters are available to record additional information. In the case of the XAFS experiment, these four counters record data from ion chambers (voltage to frequency converters), providing information about absorption in the sample.
In both systems, the transfer to the computer is controlled by either the synchronization signal in the form of an interrupt signal sent to the VME/VXI controller, or by the computer accessing the memory at regular time intervals.
IV. VME/VXI COMPUTER INTERFACE
A VME/VXI system computer interface was chosen for this instrument because of its widespread use in the synchrotron radiation accelerator environment and its high data transfer rate capabilities. The VXI extension has been implemented because of its compatibility with VME systems and its expandability. The VXI standard defines a well organized memory map for devices as well as for a common ID register, a device type register, a status/control register and a base address register. The addition of extra registers in write-mode corresponding to the previously read-only registers allows flexibility in board addressing with simple VME controllers.
The VME version of the instrument was developed and used as a prototype. This single channel PHA was fully demonstrated and used for data collection. However, the XAFS instrument in its full configuration requires twelve data acquisition channels. Four histogramming memory modules were integrated on a VXI board and three boards are used to build the system. Each board is identified by a unique register space base address defined by an eight-position switch. All the histogramming memories are directly mapped to the VME/VXI buses in terms of addressing and can be accessed in direct memory access mode (DMA) for increased speed performances. The DMA readout requires only 160ns per cycle, resulting in a readout time less than 2.7ms, per board or four channels. All the histograms can be erased at once through a common write operation. The write cycle has the same characteristics as the read cycle which leads to a total erase time of 0.7ms per board. Because the "Read" operation of the VME-based system also erases the data, only one mode of operation can be implemented for that system at the hardware level. However, since a different approach was taken for the VXI-based system, several data acquisition modes have been implemented.
V. DATA ACQUISITION MODES
The data acquisition operation is driven by trigger logic. As previously mentioned, the trigger signal can be either external, e.g. accelerator clock signal, or an internal line depending on the value of a control bit set through a register. This configuration allows the VXI instrument to operate in three different modes that were implemented in a Windows 95/98/2000 application for a commercial VME-to-PCI controller.
The first mode, called 'standard trigger mode', is driven by the external trigger so that the histogram operation is performed only when the external line is set. When the line is released, a VME/VXI interrupt is propagated to the PC, initiating data readout. The histogramming memories are zeroed after each interrupt. The number of channels used for the acquisition determines the minimum idle time for the external trigger line. For example, using the numbers given previously on the VME/VXI bus speed, twelve channels create a minimum idle time of 8ms. The erase time takes an additional 0.7ms per board, bringing the total idle time to 10ms.
The second mode, called 'accumulated trigger mode', is driven by the external trigger and works in a manner similar to the previous mode. However, in this case the erase operation does not take place for each trigger. The minimum idle time is then only 8ms. The histograms can be erased at any time in 0.7ms per board or 2.1ms for the twelve channels.
For both the 'standard' and the 'accumulated' trigger modes, the counter measuring the acquisition time and the user-defined ones are gated by the signal enabling the acquisition, up to a precision of 250ns.
The third mode is called 'live mode' and operates in such a way that the histograms are continuously updated. The data is transferred to the PC at a fixed refreshing rate set by the acquisition software. In this mode, any external trigger is ignored as the histogramming memory module is permanently accepting new counts. However, the information about the elapsed acquisition time and from the user-defined counters is still available. The histograms can be reset at anytime at a user command. Since both sides of the memory are used during that mode, there could be an arbitration problem if both sides accessed one particular location simultaneously. Occasionally, an event could occur with an energy corresponding to a memory location being currently accessed by the VXI interface. Such a conflict would result in the loss of the data. However, the probability of such an event is very low since the incoming events are random and the duration of the Read or Write pulse is only 33ns on the data acquisition side compared to the 160ns of the data transfer operation for a particular address.
For each mode of operation, the data acquisition software displays data in real time and provides basic zooming and cursor positioning. In addition, the software provides tools to save data to file with a sequential filename. A simple polynomial calibration is also provided up to polynomials of the ninth degree.
VI. CONCLUSION
Several tests have been performed on a two-channel system and on a four-channel system, both in a laboratory and in an accelerator environment. Figure 5 shows the total throughput rate for the whole system (solid dots) as a function of incoming count rate (ICR) for one channel of electronics. The shaping function was, as mentioned above, a fourth-order, pseudo-gaussian with 250ns peaking time. The figure also shows the best fit of the experimental points assuming a paralyzable dead-time model (continuous line) [4] . The deadtime from the interpolation results equal to 300ns, in agreement with the expected analytical results. Figure 6 shows the same curve compared to the system's throughput obtained by using the 500ns setting. On the same graph is a curve showing the throughput rate of commercially available systems. 
VII. ACKNOWLEDGMENTS

