Executive Summary
Superconducting Tunnel Junction (STJ) detectors are an emerging technology that offers the potential for detecting "soft" x-rays (200 -1000 eV) with an energy resolution of order 15 eV, which is a factor of 5 -10 times better than can be accomplished using the semiconductor detectors that are today's standard. Further, they can accomplish this at counting rates that are more than 10 times faster at the specified resolutions. Thus, in principal, STJ detectors could be an enabling technology for new classes of experiments at the nation's synchrotron radiation facilities by extending to the soft x-ray regime the X-ray Absorption Spectroscopy (XAS) class of experiments that has been so scientifically productive in the 2000 -20,000 eV energy range.
A primary issue is developing these detectors is that they are quite tiny, of order 200 microns square, so that an array of approximately 1000 is required to make effective use of the available x-ray fluxes. This implies a need for a large number of channels of processing electronics which must be both low cost, but also of the highest possible quality so as not to destroy the STJ detectors' intrinsic energy resolution. The goal of this Phase I work was to take a first step on that path by designing and testing a compact, low cost, computer controlled preamplifier, which is the critical link in the processing chain.
In our work we met a large number of our goals. We designed, modeled and produced a pair of designs that were compatible with our target goal of providing full processing electronics for less than $250/channel. Laboratory tests showed that their noise performance was capable of achieving the 15 eV energy resolution goal. In tests at a synchrotron we showed the viability of our concept for locating and setting the detectors' operating point voltages under computer control by showing that we could scan a detector's current vs voltage (I-V) curve digitally. Working on only a single detector, we also showed that we could achieve the same energy resolution that it displayed using existing, state of the art electronics that cost over $3,000 per channel, have to be tuned by hand, and are too bulky for large array application.
The major weakness in our work was due to difficulty in obtaining sufficient synchrotron beam time to carry out more exhaustive tests with multiple detectors and both of our designs. This was because, even though the existing STJ array has only a small number of elements, the science it produces is already of such high quality that only top ranked proposals are given beam time, which excluded instrumentation projects. The only time we could get was donated by our national laboratory collaborator after he had finished his scheduled experiment. With only 2 hours, we were not able to collect enough data evaluating the preamplifiers' performance over a range of detectors and operating conditions to convince reviewers that we were ready to move on to Phase II.
placed quite close to the sample on a high power beamline. Second, by collecting data in parallel, the whole array achieves enormous total output counting rates (i.e. 20 Mcps for the proposed 1000 element array). Total achieved solid angles will be respectable: a current 36 element detector achieves 1.2 x 1.2 mm 2 total area, which, at 8 mm from the sample, achieves a solid angle Ω/4π ≈ 10 -3 . This significantly exceeds, by 10-100, what can be obtained from an x-ray grating, and, in addition STJs have much higher efficiencies η, which, including windows, approach 0.5, compared to 0.1 for gratings. The 1000 element array would have an area of 40 mm 2 (about the same as a single 7 mm diameter HPGe detector) and a solid angle of Ω/4π ≈ 0.03. This would give the STJ array a total efficiency ηΩ/4π of about 0.3, which is equivalent to a 30 element HPGe detector (but with much higher values of both energy resolution and output count rate capability) and would exceed the best gratings by factors of 100 to 1000.
Quite recent work with a 36 element STJ detector array [3] has produced some very nice science and clearly shown the promise of the proposed 1000 element detector for working with even more dilute solutions, increasing throughput, or facilitating mapping applications. The obvious technical issues that need to be addressed before this vision becomes reality, however, are those referred to above, namely cost and ease of use.
Finally, it is worth noting that there is a trade-off in STJ design between detector size and attainable energy resolution -the bigger the detector, the poorer the resolution, primarily due to the detectors' increasing capacitance with size. Thus 200 x 200 µm STJs have 4 times larger capacitance than 100 x 100 µm STJs and can typically achieve energy resolutions near 20 eV at the 523 O-K line, compared to 10 eV with the smaller detectors. However, since this resolution is still adequate to resolve the lines of interest, the large STJs are often preferred because of their 4 times large solid acceptance angle and resultant throughput. This relaxes the requirements on the electronics somewhat, since even if they can only achieve energy resolutions below 20 eV, then they will be adequate for the larger detectors that will be used in the majority of cases. Of course being able to instrument the smaller detectors with 10 eV resolution would be a bonus.
Considering cost -the new detector technology will clearly not be widely adapted unless it fits within the constraints of the typical beamline construction budget. The most common size HPGe detector arrays that are currently being delivered lie between 19 and 30 elements (XIA knows this because we supply the electronics for over 95% of these systems). These detectors cost between about $340K and $500K, including their electronics. These numbers thus set a target window that is acceptable to the community. Now, for an STJ detector array, the cost of the cryostat and detector are essentially independent of array size. Automated cryostats area already commercially available, [1] for about $200K, that can handle the proposed array size and are extremely convenient to use. They feature push button operation to achieve the milli-Kelvin temperatures required for STJ operation. The arrays on the other hand, are produced lithographically, so that their major costs are for mask sets and production processing. Figure 2 shows a design under development at LBNL and LLNL. Clearly, larger arrays require additional connections to the outside world, but this is not an expensive technology. In private discussions, Dr. Friedrich has estimated that Page 4 of 31 $50K would be a reasonable price to expect to pay for a 1000 element STJ array, including commercial installation in the cryostat.
Taking the upper target limit of $500K for the full detector system, this means that complete electronics must be provided, for $250K sales price or less, that achieve the full resolution and count rate performance that the STJs are capable of. This $250 per channel cost will clearly be a major technical challenge, given that the parts cost alone in the STJ preamplifiers presently in use is over $500 each.
Considering "ease-of-use" -the new detector technology will also not be widely adapted unless it is also packaged in a manner that allows scientists who are not detector specialists (i.e. the chemists, materials scientists, and biologists who constitute the majority of beamline users) to routinely use STJ detectors without significant amounts of special training. These scientists are already comfortable working with HPGe detectors through software interfaces that allow them to acquire test spectra, set regions of interest (ROIs), and record fluorescent counts within these windows as a function of input x-ray monochromator energy. Thus the new STJ detectors should be instrumented so that they are equivalently easy to employ or, preferably, identically easy to use. That is, ideally, we would like the user interface to be identical for both detectors.
STJ detectors, however, are fundamentally more complex to operate than HPGe or Si(Li) detectors, which merely need to be cooled down before voltage can be applied and counting started. Following cooldown (which the cryostat manufacturers have already automated), it is necessary to trace out the individual STJ detectors' I-V curves and pick an operating point (or at least verify that a previously selected operating point is still valid). Once operating, the detectors' gains must be matched (since we cannot expect the user to set ROIs 1000 times manually). After this, operation can continue exactly as with HPGe detectors, excepting that the operating points must be continuously monitored for stability against long term drift effects. In an ideal STJ detector, these startup functions would be automated and "under the hood" so that the user need only wait for a "System Ready" light to appear before proceeding.
This, then, is the second major technical challenge, to design the STJ processing electronics to allow the three functions of operating point selection, gain matching, and operating point stabilization to be fully automated under computer control, so that the most the user has to do is to initiate these processes by pushing on-screen buttons labeled "Pick Op Point" and "Match Gains".
Significance:
The STJ array detector that is being proposed will be a superb instrument, particularly when compared to the 30 element HPGe detectors it will replace in the soft x-ray regime. For essentially the same price it will provide 20-30 Mcps with 20 eV FWHM, compared to either 6 Mcps (200 Kcps/channel) with 250 eV FWHM or 60 Kcps (2 Kcps/channel) at 85 eV FWHM, depending upon whether the HPGe array is optimized for output count rate or for energy resolution.
These raw numbers, however, only tell a fraction of the story, since the true value of a detector system lies in its ability to extract signal from background noise -that is, to accurately measure the strength of a particular fluorescence line in the presence of "background" counts coming from other sources, including other x-ray lines and inelastic scattering. A simple way to estimate this ability is via a formula presented by Heald and Stern for a detector's "effective" counting rate R E , [4] which compares the true rate at which the signal of interest gains statistical accuracy to counting the same signal R S with no interfering background rate R B : R e = R s /(1+2R B /R S ).
(1) To estimate the effect of this degradation, let us assume that, for an STJ with 20 eV FWHM (which easily separates fluorescence lines below 1 keV) the background to signal ratio is unity (a ratio that can obviously be made arbitrarily large by studying ever more dilute samples). In this case the effective counting rate is 1/3 rd of the true signal count rate. Now if, instead, the signal were being measured with an HPGe detector with 85 eV FWHM, R B /R S = 4.25 and the effective counting rate is only 10.5% of the true signal counting rate. Given that the STJ array has an output counting rate capability that is 333 times greater, the STJ's effective counting rate is about 1,000 times larger than that of the HPGe array operated in high resolution mode, considered in terms of its ability to extract signal from background. The HPGe detector's high counting rate mode cannot usually be used here effectively because its 250 eV FWHM is so large that overlap with other commonly present strong fluorescence lines (e.g. C and O) would increase the background-to-signal rate enormously.
This increase by 1,000 means that the advent of soft x-ray fluorescence detectors with 10-20 eV energy resolution and high count rate capability would vastly extend the range of problems that could be addressed by the XAS and XANES techniques, since the resultant increase in effective counting rate could either allow the study of samples that are 100's of times more dilute than those that can be looked at using presently available HPGe detectors or allow the same samples to be studied 100's of times faster. The latter capability can either be used to reduce radiation damage to delicate samples or to implement mapping measurements where, for example, chemical shifts are measured laterally over a complex sample of interest.
Technical Approach
While a common response to a requirement for electronics for systems with 1000's of elements is to start developing an ASIC processing chip, our analysis of the specific needs of STJ detectors suggests that a solution using compact discrete components would be superior.
In the first place, STJ processing requires ∆E/E ratios of 0.01 or better from large capacitance detectors, which is typically beyond the capabilities of large scale ASICs for a variety of technical reasons. In particular, circuits typically require relatively high power (e.g. high current to the front end FET) to achieve the lowest noise levels and this precludes packaging many of them on the same chip due to thermal loading restrictions. Further, as we will show, each STJ detector needs to have its operating point set separately, which requires it to have an individually controlled bias supply that can be stably set at the micro-Volt level. Finally, in order to achieve the best spectral quality at high counting rates, both excellent pileup inspection and baseline correction will be required and currently these functions are best implemented using discrete digital signal processors. It is also worth noting that the term "discrete" is often something of a misnomer these days, considering the highly integrated state of such common digital processing components as analog-to-digital converters (ADCs), digital-to-analog converters (DACs), fieldprogrammable-gate-arrays (FPGAs), and digital-signal-processors (DSPs) that we will be employing.
Our technical approach will therefore be to develop STJ processing electronics using high density discrete devices. In particular, we will integrate front end preamplifier and back end digital signal processing functions to eliminate much of the circuitry required in existing preamplifiers and thereby meet required density and cost targets. We will also take advantage of recent developments in multiple unit packaging for Op-Amps, DACs, and ADCs to multiplex processing from several STJs into a small number of digital signal processing chains. Thus, previewing our proposed Phase I design in Section 3, Figure 3 , 16 STJ preamplifiers, controlled by two serial DACs with 16 outputs, will feed into four 4-to-1 multiplexers, which, in turn, will feed into four Quad-ADCs digitizing at 64 Mcps (16 Mcps/preamplifier). The four serial outputs of the Quad-ADCs will feed into a single field programmable gate array (FPGA) where the 16 preamplifier signals can either be de-multiplexed and then digitally processed using the same algorithms XIA has perfected in its commercial DXP spectrometers or else processed in multiplexed format, a technology XIA is already starting to develop for other applications. Four of these complete signal chains will be placed on a 3U PXI card, together with spectrum memory and a DSP to match gains, monitor operating points and provide feedback control through the front end DACs. The 80 MB/s PXI backplane will allow 4K spectra to be read from all 1024 pixels in about 0.1sec (10 Hz frame rate). If faster rates are desired, the spectra can be presummed by the FPGAs to any desired degree after the channels are gain matched. The DSP controlled DACs allow the data processing chains to automatically set and monitored, as required by ease-of-use considerations.
In our Phase II effort we are proposing to develop two processor card design using this technology. The first design is a small 3U, PXI based processor card (dimension 10 cm by 15 cm, or about 4.5 cm 2 /circuit) to handle 32 individual STJ detectors for a price of about $10K ($312 per channel). This design will be ideal for supporting smaller research arrays of up to 128 elements, which would only require 4 cards in a compact 8- In Section 3 we will discuss the Phase I work we accomplished to show the feasibility of our approach. In Section 4 we will describe these designs in further detail and lay out the R&D steps we will take to develop them.
Anticipated Public Benefits
Easily anticipated public benefits fall into three categories: scientific return on the Nation's investment in national synchrotron radiation facilities; improved analytical facilities in the semiconductor industry; and new business creation, including overseas sales.
Scientific return: The United States presently has an estimated cumulative hardware investment of over 2 billion dollars in its synchrotron radiation facilities, including the APS, ALS, NSLS and SSRL, and is continuing to upgrade these facilities based on their extremely high scientific productivity. Investments to date have principally focused on the sources themselves, as well as the beamlines needed to deliver radiation to experimenters. It has long been recognized that detector development has been significantly underfunded, both because a large number of early work could be done with existing detectors and because many different detectors are required to address the full range of possible experiments, a situation that makes it difficult to assemble a large enough pool of potential users for any particular detector to either lobby successfully for funding or to create a viable commercial market. The single group contradicting this model is protein crystallographers, who are numerous and, as a result, have been well supplied with the special area detectors that they require. The Workshop noted above was convened [1] to analyze this problem and to identify areas in which new detector developments would have a major impact by opening new fields of research.
Probably the next largest user group is the one doing various forms of XAS and XANES, particularly including EXAFS, since these methods provide local structural and chemical information about dilute species. As noted above, however, their researches have been detector limited to x-ray energies primarily above 1-2 KeV. The development of the proposed STJ detector arrays would allow the same work to be carried out both for low Z elements and for elements whose L and M lines fall below 1 KeV, which will greatly increase the scientific productivity of these groups and thus the scientific return on the investment in sources.
Semiconductor industry: As the semiconductor industry has moved to ever smaller feature sizes, it has become increasingly difficult to study process defects because the scale of those defects has shrunk accordingly, reducing signal to noise. Electron microscopy, one of the major tools for defect study, has responded by lowering electron accelerating voltages, so that the electrons will be primarily absorbed in surface defects and not generate huge backgrounds from the Si wafer lying below the defects. The downside of this trend is that defect K lines can no longer be excited, so that doing fluorescence spectroscopy for chemical identification requires L and M line spectroscopy below 1 KeV. Thus these electron probe instruments would also benefit greatly from the development of an energy dispersive detector that could work effectively in this energy regime and replace the single energy diffraction grating or bent crystal spectrometers presently in use. Because modern semiconductor fabrication facilities are expensive even compared to synchrotron radiation facilities, any source of defects that threatens their productivity is a serious economic concern. The proposed detector, then, would help the industry continue on the Moore's Law path that has proven so economically beneficial both to the Nation and to the World.
New Business: The proposed product that would result from successful Phase III development would be electronics packages to instrument STJ detector arrays, whose market can be estimated as follows. The present market for HPGe detectors is relatively stable, at about 10 instruments/year, which go either to new synchrotron facilities (typically abroad) or new projects (e.g. nanotechnology research). A typical, medium sized facility typically has 4-5 of these detectors. If the new instruments achieve the same success, we can estimate that approximately 100 instruments will be bought over the next 5 years, which would lead to $25 million in total sales of processing electronics, of which approximately 75% would be overseas. Equal sales can be projected to the combination of R&D groups at universities and national labs and to semiconductor QC groups, for a total of $50 million in new business growth.
Technical Objectives -Phase II

Overall technical objective
Our overall technical objective, as stated at the end of Section 1, is to develop STJ processing electronics that have the following characteristics: 1) scalable to 1000 channels at about $250/channel sale price; 2) achieve at least 20 eV energy resolution at 20 Kcps/STJ (and preferably 10 eV); 3) completely hide the peculiarities of STJ detector operation from the detector user; and, 4) interface simply to existing data collection software.
In Section 1 we briefly described our design vision. In the following Section we will analyze the design in detail and describe the issues that must be addressed to realize it.
Technical challenges
In this section we will begin with a brief introduction to STJ detectors and then follow with an analysis of the requirements it imposes on our electronics design.
STJ Basics
Superconducting tunnel junction detectors are typically constructed as an Al-Al 2 O 3 -Al tunnel junction structure clad on one side with a thick, large-gap x-ray absorber and on the other with a second large-gap superconductor to keep charges from diffusing out of the structure. (See Figure 3) . X-rays break Cooper pairs, forming excess electrons that drift into the potential well next to the oxide barrier and then tunnel across the barrier, with a time constant τ tun , leading to a current in an outside circuit if the barrier is biased. On either side of the barrier the excess charges can decay back into Cooper pairs with a time constant τ rec . One of the peculiarities of STJs is that the charges change polarity as they cross the barrier and so can tunnel across it multiple time, contributing to the outside current each time Page 8 of 31 they do so. This enhances the outside current by <n> = τ rec /τ tun . If the superconducting gap is ∆, then the energy resolution that can be obtained becomes:
where ε is the energy required to break the Cooper pair (ε ≈ 1.7∆), and the Fano Factor F ≈ 2. For a typical ε value of 1 meV, one might therefore expect to obtain ∆E = 3.8 eV for 1 KeV x-rays if preamplifier noise is not limiting. Low noise preamplification is therefore critical. Figure 4 shows a STJ output pulse from some early LLNL work [5] , that has also been expanded 5X to show its risetime behavior and enlarged 10X to show some artifacts from its particular preamplifier. The risetime is set here by the preamplifier, as the intrinsic pulse risetime is given by the time for charges to drift to the barrier and is typically 10-20 ns, depending upon such STJ design parameters as the absorber thickness. The decay time is set by the charge recombination time τ rec noted above. It is important to remember that an STJ is a current source device with a peak current of order 100 nA per KeV of x-ray energy. [5] This current pulse could be processed in two ways: either by integrating it using a classic charge sensitive preamp, or by converting it to a voltage pulse with a transimpedance amplifier and filtering that signal directly. However, for high rate work the latter approach is preferred, since it takes over 20 µs to integrate a pulse with a 3 µs decay time to the 0.10% accuracy level the detector is intrinsically capable of. Filtering the output voltage pulse can lead to acceptable energy resolutions at much higher throughputs (See Figure 1) .
As Figure 4 also shows, designing a high quality STJ preamplifier entails more than just considering noise performance. This particular preamplifier had two defects that degrade energy resolution: a high frequency oscillation (about 600 kHz) and post-pulse undershoot (about 1.5%). The oscillations (which may either be intrinsic to the design or a pickup problem) cannot be eliminated completely by the spectrometer's energy filter, a problem whose magnitude will increase a filter times approach the 1.67 µs period of the oscillation. The undershoot (probably an AC coupling issue) becomes especially problematical at high data rates as consecutive pulses ride up on each other's tails and reducing their amplitudes. Assuring that our preamplifier design does not display these, or similar, defects will be a principal goal of our Phase I design work.
Fiske modes
Another peculiarity of STJ detectors is that their oxide barriers, being an insulator with superconducting walls, also act like very high frequency resonators. These resonances, which are called Fiske modes, interfere with the tunneling process and degrade energy resolution if the device's operating point happens to coincide with a Fiske mode. [6] Figure 5 shows an STJ I-V curve of current versus applied bias. The mean exponential increase (decrease) of I vs V is interrupted by the sharp Fiske mode resonances. As the Figure shows, the amplitudes of particular Fiske modes may be increased or decreased by applying an external magnetic field, but they cannot all be eliminated at once.
The voltage separations between Fiske modes decreases with increasing device size, becoming too close for use for detectors much above 200 x 200 µm. Increasing the bias voltage has two effects: increasing sensitivity, but also decreasing the device's dynamic resistance. The latter effect essentially increases the preamplifier noise gain, degrading Signal/Noise (S/N) ratio. Thus a preferable operating point might be selected as shown, at a point on the curve where S/N is maximized and nicely placed between two well defined modes (See [6] , Fig. 1 ). A typical inter-mode separation is only of order 10-20 µV, meaning that the bias voltage has to be maintained with 2-3 µV stability! This point raises three of our most significant design requirements: 1) to select such a point automatically (recalling our ease-of-use criterion); 2) to automatically assure that operation does not drift more than 2-3 µV from this point over time; and 3) to assure that operation does not dynamically depart more than a few µV from the operating point each time a pulse is processed.
Preamp design requirements
The existing design
The primary work developing STJ detectors for Synchrotron Radiation experimentation has been carried out by Dr. Stephan Friedrich of LLNL, using variants of a preamplifier he originally designed as a graduate student at Yale University. [7] The present version of this preamplifier is shown in Figure 6 . The design has four core components: a manual set point, a 1 MΩ transimpedance gain circuit whose front end is a discrete FET coupled to the hybrid, space qualified A250 Op-amp, a slow feedback loop to control the FET's operating point (because the A250's inputs operate at approximately 3 V, not 0 V), and (not shown), an instrumentation amplifier to measure the STJ bias voltage. The design uses 10 Op-amps and its parts cost exceeds $500, of which the major item is the A250, at about $400. It clearly cannot simply be replicated, if only for physical size and cost issues. However, it also does not meet our ease-of-use requirements, since it can only be adjusted manually, and, as noted in our discussion of Figure 4 , may have other issues which degrade resolution.
New preamp design requirements
We can therefore list the requirements that our new design will have to satisfy: 1) Size: less than about 2.5 cm 2 , leaving 1.5 cm 2 per channel for the rest of the design. 2) Set point-1: must be DAC controlled to sub-µV accuracy.
3) Set point-2: must be capable of being both determined and monitored remotely so that it can be held to an accuracy of about 2 µV. 4) Gain-Bandwidth: must be high enough at signal frequencies to amplify signal currents into voltage pulses without the input terminal moving more than 2-3 µV from the bias point. 5) Noise: after digital filtering, must be comparable to STJ inherent noise at 1 KeV 6) Parts cost: must be compatible with the $250/channel final sale price.
Of these requirements, Numbers 1, 5 and 6 are fairly obvious. Number 4 assures that the set point will not dynamically move into a Fiske mode during signal processing. Numbers 2 and 3 allow us to control and monitor the set point via computer control, which in turn will allow us to develop control algorithms to automate detector setup and hide its details from the user.
Page 10 of 31 
Digital processing issues
XIA has years of experience processing pulses of the sort shown in Figure 4 to get the best energy resolution. The energy resolution desired here is equivalent to processing a fast pulse from a gamma-ray detector where our processors commonly achieve energy resolutions in the 0.1% range (e.g. 1.8 KeV at 1.33 MeV, where 1.7 KeV is irreducible Fano noise and 0.1 KeV comes from about 600 eV electronic noise added in quadrature). We have developed robust methods for detecting and rejecting pileup, carrying out deadtime corrections, and, perhaps most important, applying precise baseline corrections, even in cases where the input count rate (ICR) has exceeded the point of maximum output count rate OCR max , where ICR max equals 1/deadtime. In Figure 1 for example, the shape of OCR versus ICR curves, particularly as they suddenly lose resolution at high count rates, is typical of a spectrometer that can no longer capture baseline samples at high data rates. XIA has developed a patented baseline capture method that is able to solve this problem even when ICR is four times the point of maximum throughput, where ICR max = 1/deadtime and OCR max = ICR max /e. [8] 
The digital challenge
The challenge in developing the proposed digital signal processor, therefore, will not be to develop any novel digital processing algorithms. Rather, the problem will be to develop a way to carry out our standard methods while remaining within the constraints set by the limits of $250 and 1 cm 2 per channel. Luckily, these pulses are somewhat slower than standard x-ray pulses and we will consider ways to turn this to our advantage in meeting our goals.
Digital system requirements
Thus the digital system requirements are: 1) Meet channel cost and space requirements. 2) Provide feedback to the preamplifier operating set point DAC.
3) Include channel-by-channel gain matching. 4) Allow at least one data set to be collected while a previous one is being read out. 5) Meet ∆E/E of 2% at 1 KeV (i.e. 20 eV at 1 KeV) target at 20 Kcps OCR. Figure 7 shows our proposed design, which will place 32 complete signal processing channels on a single 10 cm x 15 cm 3U PXI card or 64 on a single 25 cm by 15 cm 6U card. Because multiplexing can be used to process more than one analog channel for each digital channel, only a single preamplifier is shown. The STJ is represented by its equivalent circuit, which includes a current source I D , a detector capacitance C D , and a shunting dynamic resistance R D . C D is about 10 nF, and I D produces a peak current of 100 nA for a 1 KeV absorbed x-ray photon. The STJ will normally be biased at V B of less than 500 µV, for which a bias current I B of order 20 nA will flow. As noted earlier, the value of the dynamic resistance changes with bias voltage. At a typical operating point R D is about 1 KΩ.
Proposed Design Concept (Phase I)
Circuit topology and function
The basic circuit function is as follows. The transimpedance preamplifier holds the STJ bias voltage V B close to the value V s set by the filtered output of the Offset DAC and causes currents produced by x-ray absorption in the STJ to flow through the feedback resistor R f . As a 1 KeV photon generates about 100 nA, this translates into a 100 mV output pulse whose FWHM noise should be less than 2 mV (i.e. 20 eV FWHM) after filtering. This noise level must be achieved by a selection of low noise parts, careful design, and appropriate bandwidth limiting.
The preamplifier output is then passed through a variable gain amplifier and into an analog multiplexer where it is multiplexed with three other preamplifier signals and fed into one input of a Quad-ADC (4-ADC) which outputs sequentially interleaved samples of the four preamplifier outputs. The other four inputs to the ADC are treated similarly, so that, in total, it passes digitized signals from 16 channels into a single FPGA for digital processing. These circuits are replicated 4 times to process 64 channels.
A digital signal processor (DSP) is used both to adjust channel gains (allowing them to be matched under computer control) and to control the set-point DAC. Set-point control is implemented as follows. Because our circuit is DC coupled, measurements of the signal level when no output pulses are present (baseline values) are a direct measurement of the STJ bias current I B that flows to it through the feedback resistor R f . Thus, in set-up mode, we can directly measure the I-V curve by measuring I B vs the DAC set voltage V s . Then, in run-mode, we can monitor I B and adjust V s as necessary to maintain it. The DAC will be a 14-16 bit, serially controlled, 8 or 16 output device that outputs a heavily filtered fraction of a high stability reference voltage so that it does not add any noise to the STJ signals. The bias voltage V B will typically differ from V S by less than 1 µV, plus offset voltage V off , which will be of order 200 µV. This offset voltage causes no difficulties since we are measuring the same values (I B vs V s ) both when we trace out the I-V curve and set the operating point. Even though V S is derived from a precision voltage source, it can still be expected to drift with time and/or temperature, and will need to be monitored and maintained automatically, as we discussed above for ease-of-use considerations. Since our preamplifier is DC coupled, this can be done by slowly oscillating V S up and down between the limits L and U ( Figure 8 ) and making sure that the inflection point I in the I-V curve lies between V OP and L, as shown. Because we never infringe on the lower Fiske mode, this monitoring measurement can be carried out by the DSP continuously during data collection and invisibly to the detector user. One of our Phase II tasks will be to test this concept by seeing how far one may depart from the minimum point without degrading energy resolution.
Digital processor design
The digital processor design is conceptually simple. It consists of four identical processor blocks, each handling 16 STJ channels. The four blocks share a common PXI interface for data readout and setting processor parameters and a single DSP, which is used to generate I-V curves, track the STJ operating points, and match gains between STJ channels, as describe above. The outputs of 4 Mux's will be connected to a single, Quad AD9228 64 MHz, 12 bit ADC, providing an effective 16 MHz digitization rate for each individual STJ signal channel. We know from experience with our commercial µDXP digital processors that this digitization rate will be Page 12 of 31 
bits at 8 MHz. The ADC's four serial LVDS outputs will connect to a single large FPGA, where the digitized signals will be de-multiplexed and processed using the same algorithms that XIA normally uses to achieve 0.1% energy resolution with HPGe gamma-ray detectors. Figure 9 sketches this process, though with a lower digitization rate than will actually be employed.
First, pulses are detected when a fast, short time constant filter that is applied to the input pulse crosses a threshold, generating a "Detected" digital trigger pulse. This trigger causes three running sums, (Σ 1 , Σ g , and Σ 2 ), to be captured in the locations relative to the pulse that are indicated in the Figure. The FPGA then calculates the weighted sum:
and bins the computed energy E to the spectrum memory. At this point, if the STJ channel gains are well matched, energies from multiple channels can be binned into a single memory. Or the different channels can be kept separately. The circuit provides a spectrum memory to store one set of spectra for readout while the next set is being collected. This "ping-pong" buffering allows the spectrometer to be used in high speed mapping mode that is only limited by the PXI readout speed of about 80 MB/sec.
Phase I research results
Phase I research goals
The primary questions that we planned to address, as a proof of principle, in Phase I were: 1) Can we produce a preamplifier in 2 cm 2 of area for $20 or less that is DAC controlled, DC coupled, and achieves approximately 20 eV energy resolution with 200 µm x 200 µm STJs? 2) Can we use this preamplifier to design a digital spectrometer for large STJ arrays that can be profitably sold for $250/channel while maintaining the aforesaid 20 eV energy resolution and allowing digital I-V curve measurement and automatic operating point setting and monitoring? 3) Can we multiplex several preamplifier signals into a single ADC without degrading energy resolution?
The Executive Summary of our research is that the answers to both Questions 1 and 2 were both "Yes". Further, because we found, in answering Question 2, that we could meet the cost target without multiplexing, Question 3 became moot, at least as far as proof of principle was concerned. We are therefore pushing Question 3 off into Phase II as an approach to further lowering our cost/channel, since it is no longer a necessary ingredient to overall success.
Our description of our Phase I research is divided into four sections: 4.1) preamplifier design, construction, and bench testing; 4.2) preamplifier testing at Lawrence Berkeley National Laboratory (LBNL) using a 200 µm x 200 µm STJ array made available to us by Dr. Stephan Friedrich;
Page 13 of 31 
channel 6U PXI card.
Preamplifier design, construction, and bench testing
The basic preamplifier design is a transimpedance amplifier, which uses a pair of cascaded Op-amps or an Op-amp plus a low noise FET transistor to force the detector current through the feedback resistor R f . We are using two basic tricks to reduce the number of Op-amps in the original design. First, we will attempt to replace the A250, its external FET, and its required bias circuit with a single, state of the art, FET input Op-amp (the AD8067) that can operate with its inputs biased near to zero volts. The second is to DC couple the preamp to the digital processor chain so that we can monitor and control the bias voltage V B through the digital chain, rather than with additional analog electronics.
Noise modeling
We begin by creating a simple noise model to guide our design. This model is shown in Figure  10 , where the STJ detector is represented by C d in parallel with R d (the detector's dynamic resistance), drawing the bias current I b and having a complex impedance Z d . The detector is coupled to the inverting node of an amplifier (either Op-amp or FET plus Op-Amp) whose output is fed back through the feedback impedance Z f consisting of a feedback resistor R f and capacitor C f .
In this model, V n and I n are the input voltage and current noise of the gain stage, whose gain A is taken to be large enough to allow the usual approximations. The only STJ noise is assumed to be the shot noise of its bias current I b . The current noise density, input referred is then given by:
where we can explicitly see the noise effects of R f , R d , C d , V n , I n and I b . Multiplying by the bandwidth BW, we obtain RMS noise squared. If we compute I T , divide by the current the STJ produces per eV of photon energy, and multiply by 2.35 to convert from RMS to FWHM, we obtain the noise resolution in eV as:
Typical values of these terms are as follows: R f : Since sqrt(4kT) = 0.127 nA/√ (Hz/Ω), then if R f is 1 MΩ and BW is 1 MHz, this term will contribute about 3 eV in noise. I n : Again for BW = 1 MHz, if I n is 0.1 pA/√Hz, the resultant noise will by 2.34 eV FWHM. This in not surprising, since fluctuations in the amplifier input current are indistinguishable from signal fluctuations. This term sets limits on the type of Op-amps we can use if we directly couple the STJ to an Op-amp input.
Page 14 of 31 I
. This is not a noise source we can effect by our design, it can only be reduced by changes in STJ fabrication. V n : The amplifier voltage noise appears as a current noise through the presence of the STJ detector itself, via R d and C d . We note that, as the BW ω increases, this term will always eventually dominate. If we assume a very high quality amplifier, so that V n is 1 nV/√Hz (noting that the very best we might achieve is about 0.5 nV/√Hz), then, of the two terms: Rd: if Rd is 1 k, this term contributes 23.5 eV FWHM for a 1 MH BW. If, however, the dynamic resistance is 10 k, then the term is only 2.35 eV. This point emphasizes the importance of being able to precisely set the operating point, since the dynamic resistance is the inverse of the I-V curve slope and can vary dramatically over 10's of V between neighboring Fiske modes. . Cd: if Cd is 1 nF, then this term's contribution matches that of Rd at  = 1/2 MHz when Rd is 1 K, or at the even lower frequency of  = 1/20 MHz when Rd is 10 K. Thus the detector capacitance has the capability of being the major noise source, no matter how well we do with reducing Vn. Again, this parameter is out of our control, as electronics designers as it depends solely upon the STJ design parameters (area and oxide thickness). This point emphasizes the trade-off in going from smaller to larger STJs: going from 100 m x 100 m to 200 m x 200 m devices increases Cd by a factor of four and increases its noise contribution by the same amount. In the preceding discussion it is important to note that BW is the BW of the signal after digital filtering, not the bandwidth of the control loop. This allows us to keep the control bandwidth large enough to maintain the preamplifier's ability to hold the operating point dynamically, when x-ray generated currents are flowing, and then filter afterwards to optimize noise performance.
Summarizing the results of our noise analysis: we want to keep R f large, I n below 0.1 pA/ √Hz, and V n as small as possible, preferably below 1 nV/√Hz. However, for larger STJ devices, the noise may be ultimately limited by C d , no matter how well we do.
Amplifier design
Several amplifier designs were developed, based on the best parts we could locate. These fell into two categories: FET input circuits and Op-amp input circuits. Our goal was to see what the best we could do in each category, since there are significant cost differences between the two approaches, the FET design being expected to be both the better performer and higher priced. We also explored the use of two different FETs: 1) the InterFET dual IFN146, which is an InterFET replacement for the older K146, which was the state of the art but is no longer produced; and 2) The Linear Integrated Systems single FET LSK170A or dual LSK389A. The differences between the parts are: the InterFET parts have better noise performance but are discrete parts, do not have guaranteed g m values, require large 20-30 mA currents to achieve best performance, and are expensive. The Linear Integrated Circuit parts are surface mount (tiny), have guaranteed g m values, need only 3 mA to achieve best performance, and are cheap. But their noise characteristics are not as good. Designing circuits to optimize performance for the two cases is not complete, since our goal was to show proof of principle performance, not achieve best possible performance. With further effort in Phase II we expect to be able to improve upon the resolutions reported below, which are already adequate for our design goals. Figure 11 shows a typical circuit after adjustment to get the best energy resolution. The preamp input is to the InterFET IFN146, whose output is amplified by a Maxim MAX4104 Opamp. Feedback is through a 1 MΩ resistor shunted by 0.5 pF. The FET's operating point is supplied and stabilized by the dual OPA2277U Op-Amp on the left. Because the OPA2277U cannot supply the full 17 mA current required by the IFN146 to maintain its bias point near to 0 V, 14 mA are supplied directly from +12V through Rload3 and Rload4 with filtering. The remaining 3 mA of control current come from the OPA2277. The first stage simply buffers the gate voltage into the second stage input, where it is compared to the DAC output voltage, which is supplied by a very heavily filtered Linear Technology LTC2050HV. The output of the OPA2277U is supplied to IFN146 through the filter pair Rload1 and Rload 2. The effective load resistance to IFN146 is about 500 Ω.
Other circuits were built using an Analog Devices AD4899 in place of the MAX4104, and the LSK170A single and LSK389A dual FET's in place of the IFN146, which also allows us to replace the OPA2277U with a cheaper precision OP-Amp with less output current capability.
Costing
In the table below we present parts costs for two of the designs, the best performing design using the INF146, and a second design, not yet optimized, using the cheaper LSK389A FET. Total: $11.55 As the Table shows, there is significant financial incentive to try to make the lower cost LSK389A FETs work (or the even cheaper single LSK170A's at $0.80). We will give this topic further study in Phase II. However, as we will show below, the expensive design provides good performance with an STJ detector and still allow us to have a total parts cost of less than our target $40 per channel, which, as we will discuss in Section 4.14, is necessary in order to profitably sell the final systems at a per/channel price of about $250. 
Amplifier construction
We constructed the amplifiers on small, 2 cm x 4.5 cm daughter cards that then plugged into a motherboard designed to replace one of Dr. Friedrich's Eurocard preamplifier cards as shown in Figure 6 . Several typical preamplifiers are shown in Figure 12 below and the mother card is shown in Figure 13 . The mother card has power supply filtering and the DAC circuit, which is controlled using if PIC microcontroller attached to an RS-232 interface.
Note that no attempt has been made yet to place the preamplifier parts into the required 2 cm 2 , since there is no point in trying that before the design has stabilized. In fact, the daughter cards have somewhat generic designs with spaces for alternate parts and we did not work to obtain the smallest parts. For example, most OpAmps come in the tiny SOT-6 package shown in Preamp B. We also note the size benefit of using the surface mount LSK170A over the discrete IFN146 FET. Finally, these are essentially single sided boards, so the backside space is also available. We do not foresee any difficulty in placing any of the designs into the required 2 cm allows access to the on-board PIC processor to control the DAC. The LIMO connector is for test inputs. Other than adding additional filtering to the power supplies (see the added capacitors) this card worked just as designed, allowing us to make measurements in our home laboratory and also giving direct access to STJ elements in both Dr. Friedrich's array detector at LBL and in his test setups in his Cryogenic Detector Laboratory at LLNL.
Amplifier tests at XIA
To test the circuits in house, we used a BNC Model BH-1 Tail Pulser to create pulses with an 0.02 µs risetime and 5 µs fall time and 3.5 mV amplitude (corresponding to a 3 keV x-ray) which we connected to the preamplifiers through a "model" STJ consisting of a 10KΩ resistor shunted by a 1 nF capacitor. We then measured the output pulses and analyzed them using an XIA Pixie-4 spectrometer. The best energy resolution was found using a digital trapezoidal filter with an 8 µs peaking time and a 2.6 µs gap time. We also took FFTs of noise traces, both to document our progress as we adjusted the various circuit parameters to optimize performance and to identify and remove external noise sources. The figures below show typical plots from early in the process on the left hand side and from the InterFET IFN146 circuit when it was close to optimized.
As Figure 14 shows, we were able to improve signal quality remarkably by eliminating noise sources both within the circuit and without. Outside noise sources included power supply noise and pickup in a long signal cable between the pulse generator and the circuit. Internal noise came primarily from the FET current bias loop and was removed by splitting the current into control and supply branches (to reduce the Op-Amp's load) and filtering both heavily. As shown Page 18 of 31 in Figure 15 , we were able to reduce high frequency noise from -67.5 to -82.5 dB and, from Figure 16 , low frequency noise from -88.8 to -98.1 dB. These figures are not meant to show best results but rather to illustrate the methods we used in the development process.
As Figure 14 shows, we were able to improve signal quality remarkably by eliminating noise sources both within the circuit and without. Outside noise sources included power supply noise and pickup in a long signal cable between the pulse generator and the circuit. Internal noise came primarily from the FET current bias loop and was removed by splitting the current into control and supply branches (to reduce the Op-Amp's load) and filtering both heavily. As shown in Figure 15 , we were able to reduce high frequency noise from -67.5 to -82.5 dB and, from Figure 16 , low frequency noise from -88.8 to -98.1 dB. These figures are not meant to show best results but rather to illustrate the methods we used in the development process.
Similarly, Figure 17 shows a Pixie-4 spectrum of pulser data taken during our work with the Page 19 of 31 
Preamplifier testing on a 200 µm x 200 µm STJ at LBNL
In addition to providing access to STJs in his Cryogenic Detector Laboratory at LBNL, Dr. Friedrich graciously also allowed us to use a few hours of his experimental beam time on Beamline 4.0.2 at the Advance Light Source at LBNL. Here he has installed a 32 element STJ array with 200 µm x 200 µm pixels that has been in use for about 2 years. [3] Since our Eurocard motherboard is an exact replacement for Dr. Friedrich's standard preamplifier, we were able to make direct comparisons between one of our designs and the standard preamplifier using the same Fe 2 O 3 sample excited by 800 eV photons.
First Dr. Friedrich checked the bias settings on the detector elements, showing us how to scan I-V curves manually using a digital "persistence" oscilloscope to trace out the curve and then set the operating point at its standard location between two Fiske modes at 285 µV. He then collected spectra from a BN sample using detector elements A2 and B1. Both seemed to have acceptable energy resolution for our work so we proceeded with detector A2. Then, using an Fe 2 O 3 sample, we collected the data shown in Figure 18 A. When Dr. Friedrich later saw this data, with its 27 eV resolution at the O-K line, he realized that, being distracted by our conversation, he must not have set the operating point optimally and provided us with Figure 18B , taken earlier in the day, but on the preceding Cryostat cycle, showing more typical 15.9 eV energy resolution for that detector. That value is typical for these detector elements and, lying between 10 and 20 eV, is more than good enough to resolve the neighboring peaks, as shown, and carry out productive science. It is also worth noting that the requirements of operating in synchrotron environment are not favorable for achieving the best energy resolution. Beyond the shear electrical noise of the storage ring environment, two additional noise sources appear that are easily removed in the laboratory. First, because the detectors have to be very close to the samples and have very thin windows, they see IR radiation from the sample that increases noise. Secondly, the same requirement makes it very difficult to apply good magnetic shielding, so that the detectors trap flux on cool down, which lowers their dynamic resistance at the operating point and increasing noise accordingly, as shown by the R d term in our noise model. The energy resolution values obtained on these detector in these measurements, therefore, should not be compared to the best possible values Dr. Friedrich has been able to obtain, but rather to the requirements of the application, where they are more than adequate to resolve the lines of interest.
Page 20 of 31 We then replaced detector A2's preamplifier for with our Eurocard carrying a preamplifier daughter card built up using an SK147 FET, which is obsolete, but which has been shown to perform identically to its replacement part, the best quality, but expensive, InterFET IFN146 dual FET. We first attempted to take an I-V curve by scanning the DAC offset voltage is steps under RS-232 control. While the resultant I-V curve had the correct general shape, we were not able to observe any Fiske modes. After some hurried debugging, we found that connecting our computer to the preamplifier card cage using the RS-232 cable created a ground loop that introduced excessive noise and washed out the Fiske modes. Thus, while this procedure in principle demonstrated our ability to measure the I-V curve under computer control (Phase I Objective #2), the ground loop rendered the data useless for the purpose at the moment. We therefore settled, given the limited time available, for estimating zero offset voltage by sweeping the I-V curve from a large negative voltage to a large positive one and finding zero using the curve's inversion symmetry. From "zero" we then set the bias to Dr. Friedrich's standard value of 285 µV and collected the Fe 2 O 3 spectum shown in Figure 19 . The energy resolution, at 36 eV, is clearly much degraded compared to Figure 18B . This result might be due to noise in our design, excess noise pickup similar to the RS-232 problem, or incorrect bias setting. Since we only had 15 minutes before the beam time would end, possibility 3 was the only one we had time to test. So we arbitrarily increased the bias point by 20 µV and collected the fine spectrum shown in Figure 20 which, in a stroke, showed that the problem was neither excess noise nor a problem with our preamplifier design, but "only" one of bias setting. The achieved energy nearly matches that of the standard preamplifier result in Figure 18B and improves by 18 eV over our first try in Figure  19 . The utility of the better resolution can particularly be seen in the weaker line lying between the O-K and Fe-L lines, which is starting to resolve into two components -an Fe-L I line at about 615 eV and an O-K line arising from photons absorbed in the backside of the STJ and having energy about 10% higher than front side events. As we will show in the next section, the Figure 20 energy resolution is probably limited by the operating point (which was selected nearly blindly) and not by our preamplifier, since Figure 18B shows the detector can do better and our Section 4.4 results show that our preamplifier can achieve at least 12 eV energy resolution.
The comparisons between Figures 18A and 18B on the one hand and between Figures 19 and 20 on the other, therefore demonstrate first, the critical need to set STJ operating points correctly to achieve their best energy resolution, and, second, by extension, the pressing need to develop Page 21 of 31 techniques to accurately find these points automatically and reproducibly if we hope to make these detectors available for routine synchrotron radiation research.
These results make three important points. First, our preamplifier design using the expensive InterFET dual IFN146 FET, even if not completely optimized, is already fully capable of replacing Dr. Friedrich's standard STJ preamplifiers without loss of resolution or performance on 200 µm x 200 µm STJ soft x-ray detector (Phase I Objective #1). We achieved this success while reducing parts cost by over a factor of 10 (to $20 from over $300) and required area by a factor of 50 (to 2.5 cm 2 from 150 cm 2 ). Second, it is critical to be able to optimize the bias operating point if one expects to get the best performance out of these detectors. In our case, a change of only 20 µV improved resolution from 37 eV to 18.5 eV. Unless one can walk along the I-V curve taking spectra at 5 to 10 µV intervals, there is no way to be sure that one has picked the best operating point. This will be a major benefit of our digitally controlled design, since we will be able to automate the process of finding the "sweet spot" on the I-V curve that produces the best energy resolution and carry it out simultaneously for all 1,000 elements in the array. Finally, the very fact that we could move the operating point precisely 20 µV to find a good operating point shows that our digital offset system works well enough to meet the needs of operating point monitoring (Phase I Objective #2).
Preamplifier testing on a 100 µm x 100 µm STJ at LBNL
At LLNL we investigated a 100 µm x 100 µm STJ device with Dr. Friedrich's assistance. First Dr. Friedrich generated an I-V curve using his persistence oscilloscope, selected an operating point and then collected a spectrum looking at an Al 2 O 3 sample excited by secondary radiation emitted from a Tungsten target irradiated by electrons at about 6 KVp. Data rates were very low compared to the synchrotron and the O line did not produce very many counts. After about 30 minutes the best low energy line to determine resolution with was the C-K line at 277 eV, which had a FWHM of about 10 eV, but, as the peak had only 100 or so counts, this value is probably only accurate to 2 eV.
We then attached our XIA STJ preamplifier using an SK147 FET. This is the obsolete equivalent of the IFN146, which is the part used in our costing estimates. Dr. Friedrich has seen no significant difference between the two parts in his preamplifiers. Using a PIC control program developed after our LBNL experience, we set our on-board DAC controller to scan back and forth between two preset voltages, taking about 120 seconds to scan a voltage range of ± 500 µV, or ± 8,000 DAC steps. Next we disconnected our RS-232 cable from the spectrometer and re-installed the electric shielding surrounding the preamplifier card cage. Then we collected data by measuring our preamplifier's output voltage (bias current through the 1 MΩ feedback resistor) and the DAC set point voltage on a digital recording scope. By locating the traces' end points we could then assign the recorded values to the DAC control steps. Figure 21 shows the resultant I-V curve with very highly resolved Fiske modes. We note that the I-V curve does not center about the DAC's zero output voltage by about 80 µV, the preamplifier's offset voltage. This plot demonstrates that we are able to collect high quality I-V 
To further demonstrate the power of this automatic scanning technique, we selected a flat region at 4,000 DAC steps between two Fiske modes as a proposed operating point. In order to refine our selection, we then set up a "fine" local scan of ± 25 µV (i.e. ± 650 DAC steps) in this region so that we could "zoom" in for a closer view of the local Fiske mode structure. Figure 22 show the result, which allowed us to find an operating point at 4128 DAC steps that was nicely centered in the 16 µV wide region between the two nearest Fiske modes. This figure also emphasizes the need, stated above, for the capability of setting the operating point with sub-µV precision and stability.
Having set the operating point, we collected a spectrum from the same Al 2 O 3 target used above and obtained an energy resolution of slightly better than 12 eV at the C-K line, compared to the value of 10 using Dr. Friedrich's preamplifier. The statistics of this measurement were equally poor and it is not clear that this difference is meaningful. The value is good enough, however, to definitively show that the value of 18.6 eV that we achieved at the O-K line on the synchrotron beamline was limited by the detector and not by our preamplifier.
Finally, we used Dr. Friedrich's standard HP 3561A spectrum analyzer to compare our preamplifier noise spectrum to that of the standard preamplifier. The two spectra differed in two ways. First, our low frequency noise was about 10 dB higher, but not in any range that overlaps the digital filters' power spectrum significantly. Second, at high frequencies, in the 10-100 KHz band, our preamplifier showed a significant amount of some kind of regularly spaced noise, possibly pickup from the digital clock on the motherboard. While this is clearly not affecting our performance very much, we will investigate and remove it in Phase II.
Digital design and costing data 4.5.1. Processing signals from a single STJ channel
The technology to process single channel STJ data is already well developed at XIA, as indicated by our use of one of our Pixie-4 gamma-ray spectrometers to take the data reported in the preceding experimental sections. The processing method is schematically indicated in Figure  21 . The processing is carried out in a field programmable gate array (FPGA) that accepts a digital replica of the preamplifier signal ("Input pulse") provided by the analog-to-digital converter (ADC). This signal is processed with a short shaping time digital filter ("Fast filter") whose excursion above threshold signals the detection "detected" of an input pulse. The detected pulse causes three running sums (Σ 1 , Σ g , Σ 2 ) to be captured from the energy filter, which is just a running summation unit with a delay line. In the Pixie-4, we then compute the pulse energy by:
where we subtract a baseline value from each of these sums (the baselines are just averaged values of the individual sums captured when no pulses are present) and form a weighted sum of the corrected values. The weights are computed based on the pulse decay time and correct for ballistic deficit losses due to the pulse's finite rise time.
Extension to a large number of channels
There are two basic approaches to extending this processing approach to large numbers of channels and we will examine both in our Phase II research. The simplest method is to just implement as many copies of the processing circuitry as there are channels. This is the approach we used to develop our successful Pixie-16 gamma spectroscopy card, where 4 FPGAs each process 4 independent data channels. In the four years since we completed that design, FPGAs have gotten significantly larger for lower prices. Therefore, one feasible approach that meets our cost targets in building a 64 channel processing card is to simply have 4 FPGAs, each processing 16 independent STJ channels. While this is not necessarily the cheapest approach, the major fraction of the design work is complete, making it very low risk. We will therefore implement our early designs this way so that we can immediately have working spectrometers while we focus on solving other engineering problems such as solving the noise issues associated with integrating low noise preamplifiers onto digital processing cards.
The second approach, which we first described in our Phase I proposal, would be to implement signal multiplexing. The basis of this approach is that FPGA processors can easily run at 80 MHz, while STJ signals are sufficiently slow that sampling at 10 or 20 MHz is adequate to achieve good energy resolution. Noting that the basic filter quantities of interest are running sums and noting that a standard running sum of sample values {v i ; I = -N,0} (the sum is over past values)can be computed from the previous running sum and the values v 1 and v -N according to the iteration formula:
Thus one only needs a pair of accumulators and a FIFO that is N deep to carry out the operation digitally. Now suppose that one wants to do the same thing, but for j signals (e.g. eight so that we can process 8 channels at 10 MHz using an 80 MHz processor. Then Equation 6 becomes:
If the ADC is supplying a data stream containing multiplexed v ij values, then we can use essentially the same circuitry as to implement Equation 6 , except that: 1) the FIFO has to be j*N values deep; and 2) the accumulators have to cycle sequentially over the j running sums. The first requirement is trivial. The second it simple to implement with a FIFO that is j deep. Then the steps of the operation are show below and are pipelined so that they operate at the full 80 MHz process rate.
Read Write Σ j to Running Sum FIFO. The tricky part of the implementation, then, is not the basic filtering operations, but setting up the bookkeeping required by the fast trigger and sum captures, which happen asynchronously compared to both the basic processing and randomly with respect to each other. On the other hand, the FPGAs constitute a significant fraction of the design cost and replacing 4 chips at $89 each with a single one reduces the system cost by $4/channel (i.e. about 10%) directly, plus also reducing power supply costs and simplifying the board layout.
Therefore a Phase II, Year 2 goal will be to perform the engineering required to implement multiplexed processing. In our Phase I proposal we proposed a design concept using analog multiplexers so that the multiplexing would occur upstream of the ADCs. In the meantime, however, several manufacturers have announced low cost, low power 14 bit Octal-ADCs (8-ADCs) that operate at up to 50 MHz. Given the availability of these parts, the probable difficulties in designing an analog multiplexer that would not perturb the STJs' excellent energy resolution, and the cost and PC board area savings that arise from eliminating the analog multiplexers, we have decided to handle the multiplexing digitally. Thus we would run the ADCs at 40 MHz and "decimate" in the FPGA by summing blocks of 4 ADC values to create data values v ij at the desired10 MHz input rate. This summing both effectively increases the number of ADC bits and reduces the high frequency noise that leaks through the digital filters.
Proposed design: 3U, 32 channel processing card
As part of our Phase I effort, we designed two processing cards: a small 3U card that will be able to handle 32 STJ channels (dimension 10 cm by 15 cm, or about 4.5 cm 2 /circuit) and would be appropriate for supporting smaller research arrays of up to 128 elements, which would only require 4 cards in a compact 8-slot PXI crate (8" High x 11" Wide x 16" Deep) that costs about $3K and weighs 11 Kg. The second design is a larger 6U, PXI based processor card (dimension 25 cm by 15 cm, or about 5.9 cm 2 /circuit) to handle 64 individual STJ detectors and is intended to support large STJ arrays of 1,024 elements at SR facilities, which would require 16 cards in a heavy 18-slot PXI crate (16" High x 19" Wide x 20" Deep) that costs about $10K and weighs 30 Kg. While the present designs call for the use of multiplexing on the smaller card and no multiplexing on the larger, this may change as a result of our Phase II work and we learn more about the pros and cons of the two approaches. Figure 22 shows a schematic of the proposed design. The front end of the card carries 32 preamplifiers, each using 2.0 cm 2 of area. To the right of the preamplifiers is the digital processing section, with 8 8-ADCs feeding into a single FPGA. The SRAM attached to the DPP is for providing additional processing memory and will be removed if possible. The FPGA shares a common bus with the DSP and the System FPGA, which has SRAM memory to store the collected spectra. The spectra are read out through the System FPGA through the PXI interface, at achievable rates of about 80 MB/sec. The CPLD contains enough logic to boot the system and allow configuration files to be downloaded at startup. The digital power supplies make the various low voltages (1.8 and 2.5 V) required by the DSP and FPGA from the 3.3Volts available from the PXI crate.
Two specific issues have already been considered in the design so far: achieving low noise preamplifier performance on a digital card and transparency of user control which we now discuss.
xMAP architecture: we propose to maximize both spectrometer capability ad transparency of usage between STJ detectors and HPGe detectors by implementing the same digital topology that we used in our popular xMAP processor, which is the standard processor at all new EXAFS beamlines worldwide. In this design, the DPP FPGA computes filter sums and buffers them. They are read by the DSP, which computed corrected energy values which are then passed to the System FPGA, which acts like a memory manager and creates spectra or regions of interest sums, as user selected. The memory is ping-pong buffered so that one set of values can be read out while then next is collected. With this design, each card is capable of handling between 2 and 4 Mcps (60 to 120 kcps/channel), which more than satisfies our total system demands. If 2K 32-bit spectra are stored, then reading out the card will require reading 4B*2K*32 = 256 KB, which, at 80 MB/sec requires about 3.2 msec. A 4 card system (128 channel) can then be totally read out in 13 msec in full MCA mode. For shorter dwell times, mapping mode can be employed. In 13 msec, at 20 Kcps/channel, a total of 32K counts will be collected, which will have reasonable spectral quality.
Besides the raw performance virtues of this design, there are several important ease of use features to be considered. First, since the processor under the hood is essentially an xMAP, any facility already using xMAPs (most of them) will be able to integrate the new detector into existing software control systems with minimal changes. Further, any user already trained on an xMAP system will be able to use the new detector immediately and identically. Finally, from a builder's perspective, having nearly the same firmware design means that all the xMAP routines that we have already created, particularly for gain matching, but also for detector setup, can be immediately be reused. Low noise design: both on-card digital transients and switching power supply noise have the potential to be injected into our preamplifier front ends, destroying their ability to achieve the noise performance the STJ detectors are capable of. We propose to address this issue by the use of multiple isolating ground planes, separate linear supplies for the analog section, and design practices that do not allow any digital current return paths through the analog front end section. In particular, the input lines from the STJs (shown in purple in Figure 22 ) will be sandwiched between a pair of ground planes, as will the equally sensitive set-point voltages (Blue) from the DACs. These latter will share the same set of planes as the analog output signals (Orange) but they will have large ground traces between them for isolation. These ground planes will not extend under the digital section at all. Further, as shown, each preamplifier will have its own separate local ground plane that connects to the system ground at only a single point. The various ground planes will be tied together with a large number of distributed capacitors so that they cannot develop differential voltages with respect to one another. We have selected 8-ADCs with differential serial outputs, which significantly reduces their switching transients. Finally,the digital power supply section, which can generate radiative noise, is set well back from the analog Page 26 of 31 section and will have its inductors as close to the card connector as possible. Figure 23 shows our proposed 6U card design. The major differences between this card and the 3U card are as follows. First, we have assumed that each detector signal will be processed separately, without multiplexing, so we now have 4 DPP FPGAs for this purpose, each with 4M gates and 96 multipliers. With 64 channels, we also now have 8 8-ADCs and 8 8-DACs. We
Proposed design: 6U, 64 channel processing card
Page 27 of 31 immediately notice that there is a certain amount of "fixed overhead" in the digital design. No matter the number of channels, one always needs the PXI interface, CPLD, System FPGA, DSP, memory storage SRAM and digital power supplies. Thus this board has significantly more space per preamplifier (2.9 cm 2 ) in spite of having twice as many. The board will share the same xMAP design topology as the 3U card and the same approach to low noise will be taken. A benefit to the 6U design is that the connection to the backplane carrying STJ input signals is on the same card edge as the PXI interface, making it easier to design a card cage to interface to the STJ cryostat.
Removed variable gain amplifiers: comparing Figure 22 and 23 to our Phase I design proposal in Figure 7 , the alert reader will have notices that we have also removed the variable gain amplifiers. We did so for several reasons. First, on the proposed energy scale, they are somewhat noisy. While we use them on our regular x-ray detectors (e.g. the Saturn), we do not do so on our gamma-ray spectrometers (e.g. Pixie-4/16) where resolution is critical. Second, they are expensive, both in direct cost (about $2/channel) and in their additional DAC control (another $2/channel). Third, they are very power hungry, using more power in our designs than any part other than the ADC. Fourth, they are inherently non-linear on a fine scale, so that our xray gain matching algorithms have to perform several iterations, each with a time consuming spectrum collection, in order to match peak centroids to a few eV. Since the STJ peaks are much narrower, the gains will have to be still more closely matched and we were concerned about the number of iterations required to do so.
We are therefore intending to implement digital gain matching in the STJ spectrometers. Since all the detectors are produced lithographically and are fairly uniform, we expect that we will be able to choose operating points such that their inherent gains are equal within a factor of 4 larger or smaller (total range 16X). Modern FPGAs now have 18 bit x 18 bit multipliers (which were not available when we first started using variable gain amplifiers for gain adjustment). Since our ADC has 14 bits, our digital filters output energy values of up to 20 bits, and only 12 bits are required to address a 4K spectrum, we can truncate the energy values to 18 bits and multiply by a gain factor ranging from 14 bits set to 1 to 18 bits set to 1 (a factor of 16), whose largest division will be 1/2 15 , which is 1 part in 32K, or 0.03 eV in a 0-1000 eV spectrum. This process will be exact in a single iteration and will allow the gains of all the channels to be matched to this accuracy for setting SCA windows, summing spectra, or any other functions that may require uniform gain, such as inspecting for deviant spectra. We will take care to implement the multiplication in a way that does not introduce binning errors in the spectra.
Cost estimates
From the designs presented above we can produce reasonably reliable cost estimated for the two digital board designs on a per/channel basis as shown in Table 2 . Since a normal XIA markup ratio between parts cost and sales price is a factor of between 6 and 8 for large boards, this leads to a sales price estimate of $240 and $320 for the 3U card and $232 to $310 for the 6U card, both of which bracket our target sales price of $250/channel. These estimates have been carried out without any further engineering to reduce parts cost. For example, if we find we can use the cheaper FET discussed in Section 4.1.1, that could reduce the preamp cost by about $5/channel. Similarly, if we find that digital multiplexing can be implemented effectively, so that the 6U card needs only a single DPP FPGA, that would reduce its cost/channel by an additional $4. Under these conditions we would have 3U card costs of $34.98/channel (sales price of $210 to $280/channel) and 6U card costs of $29.60/channel (sales price of $180 to $240/channel). These numbers show why it will be important to keep cost issues firmly in mind during the Phase II engineering effort. However, even the conservative estimates presented above show that it is already possible to make our sales price target (admitedly at our least favorable markup) even if we are unable to make any of our further cost cutting concepts work. We also can predict, having seen it in practice over the past decade, that parts costs for identical functionality will continue to drop by more than 15%/year (Moore's Law). The new, low power, octal-ADCs that appeared since we wrote our Phase I proposal a year ago is a perfect example of this trend. Thus we can expect that even cheaper parts will be available in 2 years as we finish our Phase II R&D effort and start to produce a final commercial design.
Summary of Phase I Research: Success in meeting proof of principle objectives
Our Phase I research can be summarized succinctly as follows: 1) We were able to design and build a computer controllable preamplifier that at least matched the performance of the Dr. Friedrich's standard preamplifier when tested on a 200 µm x 200 µm STJ of the type that will be typically used in large detector arrays for Synchrotron radiation research. (Objective 1) 2) By stably setting our preamplifier's bias point with a DAC controlled by a PIC microcontroller that could scan the DAC back and forth between two set points, we were able to both collect an STJ I-V curve automatically and to zoom into a 50 µV wide local region to expand the local Fiske mode details, thus demonstrating the feasibility of our proposed method of operating point monitoring (Objective 2)
3) Using this preamplifier design, which has the most expensive FET, and without multiplexing the digital processing, we were still easily able to design a 6U processor card that we can profitably produce and sell for our target price of $250/channel. (Objectives 3 & 4) 4) Our proposed digital processing design employs the same topology as our popular xMAP processor for HPGe hard x-ray detector arrays, meaning that it will be able to use any control software developed for the xMAP. This, in turn, will allow STJ detector arrays to be used interchangeably with the HPGe detector arrays that the community is already familiar with, in addition to relieving XIA of the burden of developing a completely new set of processing firmware and interface libraries. (Bonus development)
We have thus successfully met all of our Phase I proof of principle objectives. The major collaboration developed in the course of this work was with Dr. Stephan Friedrich, of Lawrence Livermore National Laboratory. Dr. Friedrich has been working for several years to bring maturity to STJ detector technology so that it can reliably serve the synchrotron radiation community. He understands that, for the technology to be successful, it must be commercialized and not depend upon the resources of a single national laboratory scientist. He has therefore been working as well with both cryostat vendors and companies capable of producing the STK detectors themselves in order to create a network of vendors capable of supplying complete array detector systems. XIA's role in such a consortium would be to provide the processing electronics.
Technologies/techniques:
Low cost, tiny area preamplifiers that can be fully digitally controlled while achieving state of the art energy resolution. 
