Abstract-We have developed a high-speed system for collecting x-ray fluorescence microprobe data, based on ASICs developed at BNL and high-speed processors developed by CSIRO. The system can collect fluorescence data in a continuous raster scan mode, and present elemental images in real time using Ryan's Dynamic Analysis algorithm. We will present results from a 32-element prototype array illustrating the concept. The final instrument will have 384 elements arranged in a square array around a central hole.
I. INTRODUCTION he developments described here represent the combination of two separate efforts, one based at BNL relating to detector development and the other based at CSIRO concerning fast signal processing of detector signals using FPGA-based parallel computation. The BNL effort has resulted in a pair of Application Specific Integrated Circuits (ASICs) designed to allow the construction of large (hundreds of channels) detector arrays having high energy resolution, together with suitable silicon planar high-resistivity diode arrays. The CSIRO effort has taken technology originally developed for machine vision applications and adapted it to the fast processing of detector event streams. The resulting combination represents a significant advance in detector systems for x-ray fluorescence microprobes, particularly those based on modern synchrotron radiation sources. In the following we will review these developments and describe our proof-of-principle experiment which demonstrates the advantages of the system. Figure 1 shows a block diagram of the microprobe system. It consists of six main blocks: a diode detector array, a low-noise charge-sensitive amplifier and shaping amplifier ASIC, a peakdetect / derandomize ASIC, a fast ADC, an X-Y scanning stage and a digital processor. The following sections will describe each of these blocks in detail. 
II. SYSTEM ARCHITECTURE

III. DETECTOR AND READOUT ELECTRONICS
The detector arrays are planar monolithic low-leakage silicon diode arrays manufactured in-house at BNL. A range of arrays sizes has been fabricated, all based on a 1mm x 1mm pad size. The arrays are fabricated on 100mm diameter, 0.4mm thick n-type wafers having resistivity greater than 5000 Ohmcm. Pad counts of 32, 96 and 384 are available. The final goal of the development will be a 384-channel version, but for this test we used the 32-element version. This allowed us to fully test all steps of the process, since the custom ASICS designed for the system are each 32-channel devices. Figure 2 shows the detector and the readout ASIC mounted on a PCB and wirebonded together The low-noise ASIC used to read out the detectors is one of a family designed at BNL for a range of applications [1] . The one chosen for these experiments was a variant which made all 32 shaped outputs available. Others in the family had lower noise, but were designed for counting applications and did not make all analog signals available simultaneously. We expect to submit a modified version of the low-noise version with the analog output pads added later this year. Although the device used was not optimal, it was still more than adequate for this first test, having an ENC of 30 electrons RMS. The digital counting sections of the ASIC were not used. Amplifier gain and shaping time were set using a microprocessor system described earlier [2] .
T
The second ASIC provides multichannel peak-detect and derandomizing to the 32 analog pulse trains produced by the detector [3] . It can correctly capture up to eight simultaneous events on any of its 32 inputs. It also has various timing modes, and in this work we make use of one, its time-overthreshold (ToT) mode. We use this information to provide pulse pileup rejection. Reference [3] describes its operation in detail.
The output of this ASIC consists of two analog waveforms, one giving the pulse peak height, and the other the ToT encoded as an analog voltage. Both of these signals are digitized by fast (10MHz) 12-bit analog-digital converters (ADCs). The digitization is performed synchronously since the ASIC de-randomizes the values it acquires. This allows the use of low-power synchronous ADCs.
The two ADC values and an address, identifying which detector produced the signal, constitute a detector 'event'. These events are combined with synchronous X-Y position information from the scanning stage and passed to the digital processing electronics. Figure 2 . Photograph of the 32-element diode array and its readout ASIC, wire-bonded together. The visible side of the detector has 1 x 1 mm p-implants plus a set of guard rings around the whole array. These pads are bonded by long wirebonds directly to the ASIC inputs. This minimizes any parasitic c apacitances at these sensitive nodes. The ASIC outputs are in turn bonded to the PCB.
IV. DIGITAL PROCESSING ELECTRONICS
Digital processing is provided by a CSIRO-designed Hymod module, which consists of a combination of highly parallel FPGA-based processing logic which executes the mathematical manipulations required for converting the pulseheight and position information into quantitative elemental maps, and a conventional PowerPC CPU and its associated memories and interfaces, all in a Eurocard format printed circuit board measuring 220mm x 100mm.
The processing sequence is outlined in figure 3 . The pulse height and ToT values are used to establish the validity of the event, in particular rejecting pileup. Valida pulse heights are converted into photon energy, and each detector has its own energy calibration look-up table. The Dynamic Analysis (DA) algorithm [4] is then applied to build a vector of elemental contributions, one for each image, at each XY position. Values are accumulated for each element via a global weighting matrix which assigns a weight for each element based on the photon energy. Each matrix row contains the probabilities that a photon of a given energy arose from fluorescence of a particular atomic species, or by a detector artifact generated by such a photon, for example an escape peak or a scattering event. The weights may be positive or negative. Summing the weight of each photon over many photons into the accumulator vector provides an estimate of the chemical concentration of that species. This algorithm greatly reduces problems due to peak overlap, problems which are particularly intractable when the sample is complex, such as for geological or mineral samples, and can be applied using fast stage scanning and small dwell per pixel. Although post-processing of per-pixel spectra using more conventional fitting methods can also help with this problem, such methods cannot be applied in realtime. Dynamic Analysis operates on a photon-by-photon basis, producing statistically limited chemical maps in real time. This algorithm is realized in hardware, not software. The FPGA is programmed using a high-level language developed at CSIRO called 3PL [5] . This language provides an environment familiar to programmers, as distinct from the lower-level tools such as VHDL which is more familiar to hardware engineers. Since this application is essentially a custom computer, developing it as a program is much more comfortable than developing it as a series of interconnected gates and registers. 3PL has constructs allowing independent parallel threads or pipelined sequential operations, or mixtures of the two, and allowed the implementation of this system in a matter of weeks. The 3PL code is compiled to generate EDIF, which is in turn converted to a map of interconnections within the gate array by the FPGA vendor's tools. This map is then loaded into the Xilinx Virtex-II gate array.
V. SCANNING STAGE
The generation of a 2-dimensional chemical map of the sample requires that the sample be scanned in X and Y through a tightly focused x-ray beam. In our case, we were using the xray microprobe facility on beamline X27A at BNL's National Synchrotron Light Source (NSLS). The beamline generates a focused spot of 6 x 8 micrometers. Although this station is fully equipped with detectors and scanning stages, we opted to mount a small additional stage on top of the one at the beamline, allowing us full control of its operation without modifying standard beamline hardware or software. We chose a pair of simple motorized optical slides driven by DC encoder micrometers. In addition, we mounted independent linear incremental encoders directly onto the slide carriage to provide a precision readout for the DA analysis. The encoders provided TTL quadrature output signals with a 2 micron period. These signals were passed to the digital processor gate array, where a small logic block converted the pulses to a continuously updated position. These positions were then used as indices into the 2-D chemical map arrays which are the end-result of the measurement. The DC encoder micrometers were able to scan at a maximum speed of 0.4mm/s. VI. RESULTS Figure 4 shows the setup at beamline X27A. The focused xray beam comes in from the center right. The detector is the small box with the vacuum hose attached at center left positioned at 90º to the beam and at a distance of 25 mm from the beam-spot on target . The sample is mounted on the scanning stage at center, at an angle of 45º to the beam, and rastered through the beam using dwell times per XY position between 24 and 50 ms; minimum dwell in these tests was set by the stage's maximum slew speed .
Several samples of geological interest were scanned in the 48 hours available for the test. We show here one of those samples, containing a zoned pyrite vein surrounded by quartz and carbonates from the ore zone of the Emperor Mine, a goldsilver-telluride deposit in Fiji. The data were collected over 5.5 hours using 50 ms dwell per pixel. Images corresponding to 12 elements, plus elastic and Compton scattering, were projected from the data. Figure 4 shows four of these images, showing the pyrite vein (high Fe) containing zonation in As and Cu (and Pb and Au, not shown) and a carbonate region (elevated minor Mn). The left of the image is partly obscured by an aluminum sample holder containing trace Cu. The detector system and processor were not particularly taxed by these samples with data rates up to only 0.9 Mc/s encountered, which enabled all events to be logged to disk for additional off-line analysis.
VII. CONCLUSIONS
The data acquisition approach used here places little constraint on dwell per XY position. The maximum speed of scanning is largely set by the performance of the sample stage. This approach paves the way to routine ms dwell and high pixel-count imaging.
