Fast centroid-finding electronics are being developed for a range of position-sensitive gas proportional detectors. Each cathode strip feeds a preamplifier, shaper and a free-running ADC. Increased total count rate is achieved by dividing the detector into several segments with parallel processing that introduces no common dead time. Each segment has centralchannel finding logic and event listing realized in a FPGA, followed by a DSP that performs the centroid calculation and histogramming.
I. INTRODUCTION
The curved proportional detector for X-ray diffraction studies at NSLS [1] is based upon gas multiplication by a curved blade. To assure a resolution of order 100 µm, the profile of the induced cathode charge distribution is sampled by cathode strips on a pitch of 5 mm.
Each readout channel of centroid finding electronics consists of a preamplifier, shaper and Analog to Digital Converter (ADC). The number of channels involved in the computation and the interpolation algorithm both have influence on the resolution and on the differential nonlinearity of the centroid localization [2, 3, 4] . One recent example can be found in [4] ; excellent differential linearity is obtained using 5 channels digitized with 12 bits of precision.
High-count rate capability of the entire instrument can only be achieved if both the detector and readout electronics are designed appropriately. The new system will achieve this capability by using a small avalanche size in the detector and by taking advantage of recent advances in FPGAs and DSPs.
In order to maximize the counting rate, it is necessary to: 1) limit the number of ADC bits, 2) reduce the number of involved channels, 3) opt for a simple and fast interpolation algorithm, 4) minimize the common dead time.
Here, we have chosen to include preprocessing logic for central-channel finding and event listing, and to read 3 channels digitized with 8 bits of precision. The DSP is used to mitigate the differential non-linearity (DNL) due to ADC quantization errors and the reduced number of channels as part of the centroid finding algorithm. 
II. ELECTRONICS
The system overview is presented in Fig. 1 . Its principal functions are measurement of the charge induced on each cathode subdivision by charge preamplifiers and subsequent calculation of the centroid of that distribution.
A. Charge preamplifier + bipolar shaper
The custom designed surface mount daughter-board contains 32 charge preamplifiers circuits on a module 76 × 38 mm 2 . It is mounted directly on the detector cathode printed circuit board. A separate board holds 32 standard BNL bipolar amplifiers in hybrid technology (number IO533) with 250 ns peaking time.
B. Free-running ADC
The prototype printed circuit board contains 16 analog to digital converters (ADC). The ADC is a 10-bit AD9200 with parallel output (8 bits of information are used). The ADCs are free-running with a common 20 MHz clock. The advantage of this approach, compared to the usual triggering on the anode signal, is in the possibility of treating multiple hits, i.e., the system has no common dead time.
The ADC offset subtraction and gain correction are achieved using a separate 12-bit digital-to-analog converter (MAX533) to adjust the top and bottom reference voltage for each channel. This is necessary for good differential and integral linearity of position estimation.
The digitized shaper signals of 5 consecutive channels, in the neighborhood of an event nearest channel 3, are shown in Fig. 2 . At the sampling time 'nearest' the maxima of the 3 shaper signals nearest an event, the 3 corresponding ADC values contain the most accurate values of the most significant charges, Q k-1 , Q k , Q k+1 induced on cathodes k-1, k and k+1 by an avalanche on the anode. The role of the rest of the readout system is to extract these values in real time from the continuous stream of ADC values and to use them for position measurement. The rms error in peak sampling, introduced by the discrete time intervals, is derived in the appendix and can be expressed as:
for small T s , where T s is the ADC sampling period and K is the coefficient of the 2 nd order term of a polynomial expansion of the shaper pulse about the peak. For these shapers and a 20MHz ADC sampling frequency, the rms peak sampling error is <2%, small enough to have negligible effect on the energy resolution. Since the sampling errors of all involved signals are fully correlated, they have no effect on centroid calculation.
C. Central channel finding FPGA
The digital outputs of all 16 channels of one segment are connected to a Field Programmable Gate Array (FPGA). The central channel is found by first forming all the adjacent 3 way sums of the digital ADC outputs. A channel k is qualified as the central channel at sampling instant t i if the following conditions are met:
1) The sum S k (t i ) of the sampled bipolar shaper signals ADC_ch k (t i ), ADC_ch k-1 (t i ) and ADC_ch k+1 (t i ) must exceed a threshold in order to discriminate against the noise background.
2) The sum S k (t i ) must exceed S k+1 (t i ) and must not be exceeded by S k-1 (t i ). In addition to partially qualifying channel k, channels k-1 and k+1 are disqualified from being central channels.
3) The sum S k (t i ) must exceed S k (t i+1 ). This identifies the sampling instant nearest the peaks of the analog bipolar signals involved, in the sense that their sum has peaked.
When all of the above conditions are satisfied, the ADC data on channels k, k -1 and k +1, at time t i , are stored in a buffer as Q k , Q k-1 and Q k+1 . The central channel sensing electronics for all 3 of these channels are then disabled until the shaper pulse dissipates, 16 samples in this case, providing post-pileup rejection. 16 such buffers (one for each possible central channel) are scanned sequentially for new data. Any new data, along with the number of the central channel, is subsequently written to an on chip First-In First-Out memory (FIFO). Each element of data (32 bits) listed in the FIFO represents one event and contains all the information needed by the DSP to perform the centroid computation. The FIFO can store up to 15 events. The FPGA logic is fully pipelined, and can give one output for every clock cycle.
The FIFO not-empty flag is polled by the DSP. If data is available, it is read by the DSP and the centroid calculation processing is begun. 
D. DSP, histogramming and readout
A Digital Signal Processor (DSP) is a processor optimized to execute fast multiply-accumulate instructions. For this project we have chosen the Texas Instrument's TMS320C6201/C6701. In a 5 ns clock cycle, it can execute up to eight 32-bit instructions. The DSP features 128 Kbytes of internal static memory, accessible in one clock cycle, and can access 16 Mbytes of dynamic memory on the board.
The DSP program is written in assembler and fully optimized. It performs the extraction of 3 charge signals, and the central channel k, from the single 32-bit FIFO output and executes the centroid finding algorithm on 3 strips. A histogram of the sum of the 3 charges (corresponding to total charge of the involved cathodes) and the charge of each independent ADC (when it is chosen as the central channel) are made for diagnostic purposes. The ADC quantization error is mitigated by the addition of 8 random non-significant bits to the original values. Subsequent calculations are carried out with 32-bit precision. The execution time is less than 100 clock cycles, and depends on the chosen algorithm. The event rate is therefore limited to around 2 x 10 6 events/sec/DSP. Histogramming is performed directly in the DSP internal memory. Each histogram update consumes 10 clock cycles and does not affect significantly the maximum count-rate. A GPIB interface, realized using the National Instrument's TNT4882 chip, permits the read-out of the histogram results by a host computer.
E. Printed Circuit Board of one segment
Special care has been taken in the layout of the printed circuit board because of the coexistence of a high speed DSP system on the same board as the ADCs and the DACs. Analog and digital sections of the board are geographically separated as much as possible, all analog signals are shielded in a dedicated layer between two ground planes, and each ADC and DAC (driving the ADC's top and bottom reference voltages) has a separately filtered power supply connection (LC filter 47 µH × 33 µF). The photograph of the printed circuit board (8 layers) holding 16 ADCs, FPGA, DSP and a GPIB interface is shown in Fig. 5 . 
III. POSITION ACCURACY
The intrinsic spatial resolution of the detector depends on the physical processes in the gas, but resolution can also be limited by the electronic noise in the charge measurement. The position error due to electronic noise is given by [3] :
where w is the strip pitch, σ q is the rms error in the charge measurement per strip, Q tot is the total charge integrated by the readout electronics in the measurement time, and K m is a constant (between 1.5 and 3 [3, 8] ) that depends on the centroid-finding algorithm. Eq. (2) indicates that electronics optimized for low noise [9] is necessary in order to allow operation in the low gas amplification mode (small avalanche), a fundamental condition to obtain high counts rates with any gas detector.
Eq. (2) demonstrates that for a detector system characterized by w = 5 mm, σ q = 1.5×10 3 e -and K m = 2, only a small total cathode charge, of order of Q tot = 0.05 pC, is required to obtain an electronic noise precision of less than 50 µm.
IV. DIFFERENTIAL NON-LINEARITY (DNL) CONSIDERATIONS
The limited number of ADC bits and the reduced number of involved channels for sampling of the induced charge, as required for high-count rate, degrades the differential linearity of the detector.
In order to define the influence of the ADC quantization error, and of the centroid-finding algorithm, on the DNL prior to the building of the hardware prototype, we have performed computer simulations using a simplified mathematical model of the detector and associated electronics. We have also stored on a PC hard disk experimental data generated by a uniformly irradiated detector using an 8-bit digital oscilloscope (LeCroy 9360). We have then processed the same set of data with different algorithms. After analyzing and comparing the results of the simulation and of the measurements, we have opted to use only three 8-bit samples of the cathode charge for the design of the prototype encoding electronics.
Centroid-finding algorithms slightly different from those currently used [2, 5, 6] are necessary in order to compensate for this high-count rate compromise. The algorithms must be easily implemented by digital signal processing and they must take into account the ADC quantization errors.
Correction of the quantization effects on DNL
In accord with [7] , and independently the used algorithm, our simulations and measurements affirmed strong differential nonlinearly due to the quantization effects of the 8-bit ADCs when using conventional algorithms. The DNL degradation also depends on the number of interpolated pixels between two readout channels and is near ±10% for 16 pixels/ch.
We have demonstrated that this problem can be solved by digital signal processing simply by adding 5 to 8 random, non-significant bits to the original 8-bit ADC value and subtracting the mean value. All subsequent operations are carried with 32-bit precision. We found that it is sufficient to add the random bits only on two outer channels.
We can use eq. (2) to estimate the degradation of the spatial resolution due to quantization of the signals and the addition of the random non-significant bits:
In equation (3) we have supposed that mean total charge (3 channel sum) is equal to the full scale output of one channel, as in Fig. 6 . Thus, for a detector with w = 5 mm and K m = 2, the position degradation is typically a negligible value of 16 µm (rms).
In our system, the pseudo-random numbers used for this technique are created by extraction of different bit-zones from an internal counter register. As these operations are executed in parallel with the main course of the centroid-finding program (using available 'C6201 resources), the procedure does not slow down the counting rate.
Centroid-Finding Algorithm
For this system, we have considered only algorithms for centroid computation that use the charge information from only 3 adjacent cathode strips (in order to contain all the necessary information for one event in 32-bits of data).
A. Center-of-gravity algorithm on 3 strips
Let the charge collected on the central channel k be Q k , and the charges to the left and to the right be Q k-1 and Q k+1 . The position is then given by [5] :
Using the 'C6201 digital signal processor, the execution time is 85 clock cycles. At the connection between two strips, the differential linearity is affected by the choice of the central channel. The correction factor C 3 [4, 8] minimizes this effect by forcing (4) to be continuous at cathode boundaries. The correction factor changes with the anodecathode distance d for a given detector. For a detector with the standard geometry [11] (where w/d = 0.8), C 3 is equal to 1.51. For the curved blade detector [1] , with zigzag cathodes (w/d = 2.12), C 3 is equal to 1.13. It is important to note that this factor is also a factor of the algorithm-related noise parameter, K m , used in equations (2) and (3). For this method K m is equal to 1.65 [3, 8] .
B. Gaussian fit
For this algorithm [6] , the position is given by: 
This method has excellent differential linearity at the points midway between two strips because the position is still correctly determined even if the location of the central channel is miscomputed. However, as the actual distribution of the cathode charge is not a Gaussian function, a systematic error between the true position and the center of the Gaussian function appears at points near the middle of the strip. This method is best suited for a narrower charge distribution over the cathodes (for example with capacitive charge division [12] or with zigzag cathodes) since the Gaussian function is then a closer approximation of the true distribution of the cathode charge. The factor K m used in equation (2) is for this method equal to 1.89 [8] .
The implementation of eq. (5) calls for the use of a floating-point processor. When using the fixed-point processor 'C6201, it is necessary to implement the logarithmic function as a look-up table using the DSP internal data memory (accessible in one clock cycle) in order to avoid severe delays in program execution. The restricted memory space limits the random bit addition to 5 bits (the lntable is accessed directly with a 13-bit word formed by 8 bits of the ADC value and 5 random bits). To avoid severe quantization effects, the 32-bit values in the table correspond to 10 5 ⋅ln(address).
The execution time, including all test histograms, is 104 clock cycles.
C. Look-up table
With this technique, the position is given by a simple 24-bit address access to an external memory using the 3 ADC values as the address. This memory has been previously loaded with results for all the possible ADC data resulting from an event. Processing of three 8-bit ADC values requires 16 Mbytes of memory. For the 'C6201 processor, a single access to the on-board dynamic memory takes only 7 -42 clock cycles (depending on the address used to access the look-up table), significantly faster than any calculation. A uniformly illuminated detector requires on average nearly 42 clock cycles per access and illumination with a single collimated beam will be closer to 7 cycles per access.
Any function can be realized without compromising the speed of the data acquisition, however, it is not clear how to reduce the ADC quantization effects with a look-up table.
V. COUNTING RATE EFFECTS
Generally, the readout electronics adds a common deadtime associated with the length of the delay line for delay line position sensing, or, anode signal processing time for systems using the anode signal to trigger the charge measurement on the cathodes.
The presented system has no global (common) dead time and the local dead-time, important for "pileup" rejection, is applied only locally for 16 samples (bipolar signal support time of 1 µs is shown in Fig. 2 ). This local dead time is also comparable to the evacuation of the positive ions in this detector.
Since the digital central channel finding electronics imposes no common dead time and is capable of storing one event into the FIFO each clock cycle (20 MHz rate) the only global limitation on the counting rate is given by the algorithm executed by the digital signal processor. By using the 'C6201 processor, every algorithm on 3 charge samples can be realized in less than 1 µs, which gives a counting rate per segment in excess of 10 6 events/sec.
VI. TEST RESULTS
We present test results of the prototype electronics with a standard 10×1 mm 2 proportional detector (cathode pitch 2.67 mm, anode -cathode gap 3.3 mm [10] ) and with the curved proportional detector [1] .
Although the signal from the anode is not processed, Fig.  6 shows that the free-running ADC approach has little impact on energy resolution and it is possible to obtain excellent energy spectra simply by digital addition of 3 cathode charges. 1  18  35  52  69  86  103  120  137  154  171  188  205  222  239  256  273  290  307  324  341  358 The response of the 10×1 cm 2 detector to uniform illumination by 5.4 keV X-rays is presented in the Fig.7 . The center-of-gravity algorithm creates differential non-linearity problems between two channels (if the charge on 2 channels is almost equal, the choice of the central channel changes discretely the calculated position). This picture also shows a small residue of the systematic ADC quantization error in the center of strip (less than ±2%). The position/resolution measurement using the 10×1 cm 2 detector is given in Fig.8 . A beam of X-rays collimated to 25 µm was moved in steps of 1 mm. The position resolution of σ x = 43 µ has been measured with anode charge of 0.25 pC (anode bias voltage = 1540 V, for this detector, the induced charge on 3 cathode strips is around 40% of the total anode charge). For a total anode charge of 0.15 pC, (anode bias voltage = 1486 V) the resolution measured in the center of a cathode is 44 µm. As both results are close to the intrinsic limit of the gas, we can estimate that the electronic noise does not contribute significantly to the overall resolution. Using an independent C 3 factor for each central channel can solve the problem, but this can be impractical.
For the experiments where this type of "double-peaks" is not acceptable, the Gaussian fit method can be used.
The position/resolution measurement using the Gaussian 1 mm σ x = 43 µm fit algorithm is presented on the Fig. 9 . It confirms that this method does not create "double peaks" at the connection of two channels, but indicates clearly an irregular distance between different peaks. (Fig. 10 ), is consistent with the observed peak-shift in the positions measured in Fig.9 (±15% or ±400 µm). Since the peak displacement is systematic, and does not fluctuate with time, it can be corrected via post-processing, with an image treatment applied to the accumulated histogram. 1  37  73  109  145  181  217  253  289  325  361  397  433  469  505  541  577  613  649  685  721  757 The UIR for the curved proportional detector using the center-of-gravity algorithm is presented in Fig. 11 . The DNL is clearly affected by the choice of the central channel. Nevertheless, as is shown in Fig. 12 , position measurement of a collimated beam shows a resolution of 59 µm in the center of the readout channel and 65 µm near the interchannel connection.
This measurement was performed with total anode charge of 0.2 pC (anode bias voltage = 2400 V, anode charge was measured with 1 µs delay line shaping). The total charge collected on 3 cathode strips, with 250 ns peaking time bipolar shaping (Fig. 2) , was for this measurement 0.11 pC.
The UIR of the curved detector using the Gaussian fit (Fig. 13) shows important systematic errors (which can be corrected for each detector) and also the main benefit of the method: no sharp discontinuities on the UIR, a favorable condition for good position resolution (Fig. 14) . 1  37  73  109  145  181  217  253  289  325  361  397  433  469  505  541  577  613  649  685  721  757 1  36  71  106  141  176  211  246  281  316  351  386  421  456  491  526  561  596  631  666  701  736  771 609  615  621  627  633  639  645  651  657  663  669  675  681  687  693  699  705  711  717  723  729 The degradation of the resolution of the peaks away from the center of the graphs in Fig. 12 and 14 are the result of parallax errors caused by the parallel translation of the 
