Development of a real-time (RT) data acquisition (DAQ) system for ASDEX Upgrade was started years ago. Based on a modular front-end configuration with serial input/output computer interface a small number of powerful new diagnostics were commissioned. In the course of these developments we concluded that "direct to memory" DAQ concepts hold promise for future DAQ tasks including those with a subsequent RT data processing step. A survey of the requirements of existing and future diagnostics at ASDEX Upgrade was conducted to assess the wider applicability of these DAQ techniques to the upcoming needs for new diagnostics or the refurbishment of old ones. The paper presents the results of the requirements survey and indicates in which cases the developed diagnostic concept can be applied. A series of new diagnostics implemented using these techniques is presented in detail. This covers developed front-end modules as well as whole diagnostic configurations in hardware and software. The achieved DAQ data rates, data transport latencies, and fine-tuning steps to maintain a reliable RT operation are described explicitly. An outlook is given upon the definition of a RT diagnostic standard for ASDEX Upgrade based on the developments described previously.
Introduction
The concept of performing DAQ directly into memory as it is used at ASDEX Upgrade (AUG) was described in detail at the last meeting in Inuyama [1] . It may be allowed to recall the main features here. The RT DAQ concept is fundamentally centred around a serial input/output (SIO) computer interface, connected on the measurement front-end side to a flexible pipelined backplane supporting customizable signal-to-digital conversion modules, and on the computer side to various form factors 1 of the PCI-or PCIexpress-bus (cf. fig. 1 ). To optimally support real-time data acquisition, the computer interface features an FPGA logic which bundles all digital samples produced by the front-end modules at a given clock pulse together with a timestamp into a data frame. These frames are aggregated temporarily in a FIFO buffer to be forwarded via DMA (direct memory access) from the SIO interface to the DAQ application running in the computer as a virtually continuous stream of time stamped data blocks. Directing this stream of frames into a large shared memory section in the computer, all channels of a diagnostic ordered time-step-wise become simultaneously and with short latency accessible for further processing. The whole sampling, transporting, and frame building process is tightly coupled to one central clock which itself is synchronized with the central experiment clock. The behaviour of the front-end modules, the pipeline, and the FPGA logic in the SIO interface is completely deterministic. On the computer side, a soft real-time OS 2 conforming to POSIX 1003.1b real-time extension standards is required to provide an appropriate environment for the DAQ supervision process as well as for the RT data analysis and RT communication with Control. While this development was initially motivated by the need to replace existing serial input modules, it soon turned out that together with the front-end pipeline it could serve as a new general DAQ concept for ASDEX Upgrade covering real-time needs as well as refurbishment requirements of old CAMAC diagnostics. Even the requirements of diagnostics with a big bulk of medium speed channels or with cutting edge data transmission demands seem to be reasonably addressed by the SIO concept.
Requirements for RT diagnostics at AUG
To assess, whether a new standard for data acquisition is actually required, and if the SIO concept would be appropriate for most cases of diagnostic set-ups or rebuilts foreseeable in the next years, we started a survey to gather information about planned or prospective diagnostics and their technical requirements among our diagnosticians.
. A primary criterion for data acquisition is the data rate achievable by a certain arrangement. Of course, this depends on the bandwidth of the involved computer architecture. Since this key quantity is constantly increasing, an interface for DAQ should be able to develop with computer architecture as flexibly as possible.
How SIO and Pipeline address the Requirements
The SIO and pipeline design is based on two main assumptions deriving from the requirements analysis and the real-time design goal:
• diagnostics come with a reasonable number of similar channels;
• data is transferred in real-time from the front-ends into computer memory. These presumptions unify the manifold of requirements and lead to a simplified and easy to handle yet powerful system design. The pipeline backplane designed under these constraints not only serves as a universal carrier, power supply, and bus system but simultaneously as a fast data transport and a simple multiplexing concept for front-end modules. Making strong use of assumption 1. that nearly all diagnostics have a bunch of similar 3) This comprises unipolar or bipolar, single ended or symmetric, voltage, current, or resistivity adapting inputs with a varying sensitivity in the typical low voltage / low current range. Additional features like low pass filters, automatic offset correction, or further deterministic signal conditioning may be required. 4) In the CAMAC era the addressing scheme was CNAF[DR] meaning crate, station, address, function, data and repetition. The SIO and pipeline system addressing scheme to address channel controls is comparable: the PCI device number and the HOTLink number on the SIO card together address a particular pipeline. A module in a pipeline slot can be identified by its position, but it is also addressable using a Serial Peripheral Interface bus (SPI-bus) protocol implementing a half-duplex communication directly between application and front-end module. (See below: 'Implementation of Channel Controls') 5) This in fact is already done for the Doppler reflectometry (PRA), the soft X-Ray, the Mirnov probes,the electron cyclotron emission (ECE) diagnostics, and the fast radiation diodes (XVR). channels, the pipeline design restricts itself to operate all modules in the same way in lockstep mode. This causes a major simplification of the interface between the backplane and the inserted front-end modules which in the end makes it very easy to build custom modules adapting all kinds of input circuitry to the backplane. The pipeline control card ("yellow card") controls the pipeline on the backplane and carries the de-/serializer and communication mezzanine board for the connection to the SIO card. All functional modules are connected in a row. Instead of memories they have 16-bit parallel shift registers to hold and transport the samples from the inputs along the chain. This chain of shift registers is driven by a clock directly derived from the HOTLink II clock on the SIO card which in turn is synchronized with the central experiment clock via the embedded TDC (time to digital converter, described elsewhere: [2] , [3] ).
The SIO card at the upper end of the chain receives the deterministically incoming stream of samples from up to four pipelines through its four-fold HOTLink II controller, merges all four streams packed with a time stamp for each sampling cycle into data frames, and forwards these frames immediately, by way of a small FIFO memory on the computer interface to adapt to the non-deterministic operation of the computer system 6 , to the host's memory. As described, the SIO and pipeline concept mediates in an unrivalled way between the computer bus and the multi-channel front-end. It is generic, transparent, and easy to configure yet it is highly focussed on the task of synchronizing all channels with the central clock and delivering frame-structured data from the front-ends with maximum throughput into the host's memory. Taking into account the large palette of front-end modules and already available its simple extensibility, SIO provides the flexibility to meet the above mentioned requirements in all but a few special cases, as well as possessing the structural clarity to serve as a unique new design concept. We believe this architecture will prove well-suited as a new standard solution for diagnostics at ASDEX Upgrade.
Implementation of Channel Controls
The required channel controls (cf. above) have been implemented for the time being as a non-real-time operation mode of the bidirectional HOTLink II connection. This communication mode implements an SPI-bus 7 connection between the SIO card, the pipeline control card, and to all inserted modules via dedicated lines on the backplane. A small CPLD logic on all modules issues replies to these SPI-bus requests and allows reading and writing registers on the modules to set or to read out configuration settings. While this communication is slow compared to the DAQ mode, it seems sufficient for settings which remain static throughout a whole shot. For real-time manipulation of channel settings as well as for real-time output of data, ideas exist for how a totally bidirectional real-time operation can be realized. At the present time, this capability is not urgently required, but may be implemented as a future enhancement.
Actual SIO Diagnostics in Detail
Describing how several new diagnostics at ASDEX Upgrade have been implemented with the SIO and pipeline concept serves to illustrate the application and the impact of the new standard.
Fast Photo-Diode Plasma Radiation (XVR) [4]
The sensors of the XVR diagnostic consist of a number of 16-pixel line arrays (16 diodes in a row) which are sensitive for light in the range of infrared to soft x-ray. The diodes react very fast and, equipped with a simple collimator optic, they cover a diagnostic gap between bolometry and soft x-ray. A sampling rate of up to 500 kHz is of interest for detecting, for example, radiative events during ELMs. Since these simple 16-channel cameras are cheap and small they can be placed at many positions within the vessel. This diagnostic immediately poses a requirement for a high number of 500kHz sampling channels. This is not necessarily a real-time requirement; however, a large number of channels requires a huge amount of memory and so it seems appropriate to select a solution which makes use of the inexpensive and huge memory capacity of today's 64-bit computers. To adapt optimally to the special diode type and for the desired observation purpose a custom ADC module was designed featuring an exchangeable 4th order Bessel input filter for anti-aliasing, an operating amplifier with programmable gain, and an automatic offset correction. The chosen analogue to digital converter is a 14-bit pipelined ADC 8 which needs four clock pulses before it outputs the digital value, thus incurring a time delay 6) A FIFO size of 32 kB is foreseen to buffer short dataflow interceptions which may occur in-between two DMA operations or because of PCI-bus arbitrations. None of these events is expected to interrupt the DMA transport for longer than 100 or 200 µs in the worst case. between time-stamps and samples. However, the time-stamp logic can be configured to compensate for this time shift, and the increase of latency by the delay is not significant for this diagnostic. Sixteen (16) of these modules are required per diode line, just enough to fill one pipeline. Four pipelines make up the four links of one SIO card. This gives a total of one time-stamp plus 64 channels per frame. At a frequency of 500 kHz, this implies a data flow of 68 MB/s per SIO card. To connect even more channels to one computer in this case two SIO cards are situated in an external compact-PCI crate which itself is connected via a National Instruments MXI-Express PCIe8362 bridge [5] to the PCIe-bus of the computer. This way, two SIO cards share the bandwidth of one PCIe lane and deliver in two parallel DMA streams a total of 136 MB/s into the memory. After the real-time phase of the shot -which lasts about 8 seconds -the acquired raw data -which is still resident in memory -is accessed by the non-real-time after-shot program phase and is in memory converted into an ASDEX Upgrade shot file of about 1GB size. This shot file finally is stored simultaneously on local disk for temporal local access and sent to the AFS shot file archive for long-term public access. The XVR diagnostic was just recently extended again (cf. the XVR marked orange diamond near the green line in fig. 2 ). This upgrade in a trivial way doubles the number of channels by doubling the number of computers and bus systems.
Upgrading the ECE diagnostic to SIO
The ECE diagnostic at ASDEX Upgrade [6] [7] features 60 channels with a desired final sampling rate of 2 MHz. The special ECE receiver electronics are part of the diagnostics physics hardware and not subject to DAQ improvements. Until two years ago, the ECE was a 31.25kHz CAMAC diagnostic designed for ex post data analysis. Plans to observe MHD plasma phenomena with the ECE and to run real-time algorithms for e.g. MHD island recognition and to identify the ECRH deposition flux surface made it necessary to upgrade ECE to realtime and higher sampling rates. A new DAQ system based on two computers like the Mirnov and Soft X-Ray computers was planned and new four channel ADCs were designed for this purpose. However, during the upgrade it became clear that a more powerful computer interface than HOTLink I (and more powerful computers) would be required to achieve a final sampling rate of 2 MHz. A compromise design for one half of the ECE diagnostic (32 channels) is now a pipeline crate with four threeslot pipelines. Each pipeline hosts a pipeline control card and two 4-channel ADC cards. The four pipelines are connected to one SIO card interfacing to a SunFire computer system with conventional PCI-bus. The limited PCI bandwidth here momentarily limits the whole data rate to below 80 MB/s and causes us to restrict the ADC sampling rate to a maximum of 1 MHz. Our purpose in continuing along this compromise path is to demonstrate a proof of principle as soon as possible for the real-time operation of data acquisition and subsequent data analysis for the mentioned ambitious flux surface identification. If successful, we plan to replace the existing ECE SunFire platforms by a fast x86-64 multi-core platform with PCIexpress interfaces.
A General Purpose ADC for Standard Diagnostics
Having configured diagnostics since many years and affirmed by our requirements analysis we found that in many instances ADCs with a common set of features were required. The need to provide an off-the-shelf supply of customizable serial production ADCs 9 led to a subdivided modular design of pipeline front-end modules for the SIO system. This design is based on a standard carrier board conforming to the SIO-pipeline form factor. Basic features of the carrier are a front-panel with four Lemo plugs, and the typical pipeline interface back-end. Two pluggable mezzanine boards are provided to hold custom electronics circuitry. With two ADC mezzanines, for example, the carrier is cast into a two channel ADC front-end for the SIO system. Types of mezzanine boards with ADC circuits (featuring a monolithic 14 bit ADC from Analogue Devices [8] and also a pluggable signal filter) as well as more special applications (e.g. the new fringe counters for DCN) already exist. This customizable standard ADC (or carrier) module has a broad application range varying from the refurbishment of legacy CAMAC diagnostics to setting up new diagnostics with high sampling rates up to 2 MHz. Following diagnostics have been recently equipped with the first 25 prototypes:
• The DCN diagnostic signals will be moved from CAMAC to SIO modules based on this development to achieve real-time performance.
• Motional Stark effect: 20 channels, 200 kHz sampling rate, up to 32 channels are envisaged;
• Thermal element measurements in Divertor and Heat-Shield tiles: 12 channels, 10 kHz sampling rate;
• Fast Langmuir probes: a first set of 8 channels is planned to be upgraded to a sampling rate of 2 MHz, many more (up to 256 channels) could follow if the upgrade is decided as desirable. Other diagnostics already have ordered this standard ADC and a second series will follow the prototype. 9) Considerations to use commercial off-the-shelf (COTS) components have been taken into account at many steps of the design process for the AUG diagnostic components. No lasting cost, nor maintenance, nor design or programming advantage over a design on purpose could be found when fusion research sensor adaptation requirements and the desire for a uniform diagnostic design come into play. 
Two Groups of Different Devices in the same Pipeline for DCN Measurements
As mentioned above, the DCN uses three standard ADC modules to acquire the signal from the old fringe counters in real-time. Alongside these ADCs a new set of fringe counters employing new digital phase shift detection methods was designed directly to be implemented as mezzanine boards for the standard pipeline modules. These two different types of modules will be placed side by side in the same backplane and will run at the same sampling speed of 50 kHz. Incidentally, this arrangement is an illustrative example of a diagnostic running different conversion devices in unison. Finally, it shall be possible to draw complementary information from both types of measurements to immediately identify fringe jumps in the case of fast density events.
The RT Behaviour of SIO DAQ
As already mentioned the SIO concept does not feature large memory in front-end modules but is completely dedicated to low latency data transport by directly transferring acquired data into the computer's memory. This requires a real-time operating system ensuring no major interruption during the data acquisition phase. The only noteworthy buffers in the SIO card consist of a 32 kByte FIFO -implemented as part of the FPGA logic -which buffers the DMA data stream during short unavoidable congestion events. Some diagnostics, as for example the XVR, already produce data at a rate of 68 MB/s per SIO card. Given this throughput in relation to the FIFO size, the maximum allowable interruption of the DMA stream evaluates to less than 470 µs. Considering this as the critical real-time condition for the DAQ task we needed to know how near we are to a violation which in its consequence would mean a loss of data. Obviously, this constraint requires a preemptive task scheduler and a priority of the data transfer task above all normal system activity. Under Solaris this can be provided by putting the task into the RT scheduling class and giving it an appropriate priority. However, care must be taken to avoid system events effected by the tasks activity itself, such as page faults. By an mlockall() system call for all current and future memory allocations of a particular task one prevents the tasks address space from paging. In addition it is required to touch each memory page e.g. by initializing all arrays to zero. As a last measure against unexpected interruptions, it may be necessary to change the default page size for the task to a higher value than normal, to avoid reloading the addressing translation table of the memory management unit (MMU) which normally costs a few hundred µs.
To fully understand the DMA timing constraints we also measured the system overhead for reinstating DMA device reads. Delays in the range of 40-50 µs on a 1.6 GHz UltraSPARC processor mean a reasonably small system overhead in the range of some percent if the reads are to be reissued every millisecond or two. Under these circumstances we decided to make the DMA length flexible to adapt it to the time constraints of the downstream RT analysis task. For a calculation time cycle of 2 ms and a raw data flow rate of 72 MB/s, for example, one would choose a DMA length of 144 kBytes ending up at a DMA system overhead of 2.5%. This fine-tuning can ultimately achieve a loss free data flow up to the technical limits (cf. fig. 2 ). To recognise a violation reliably and effectively to guarantee the integrity of whole data frames in case of FIFO overflows, the SIO FPGA implements a special frame drop logic which includes an internal counter and a time latching register for the first occurrence of a frame drop. Checking the integrity of the time base at the end of a whole DAQ shot cycle is routinely done. All diagnostics released with Solaris and SIO into regular operation do not drop frames.
RT Analysis and Control Integration Fig. 3 provides a schematic view of the building blocks (processes, devices, communication paths, etc.) and their interoperation of an AUG RT-DIAG. While the above descriptions where mainly concentrated on the RT data taking issue (the lower part in the schematic), the following will describe the remaining (middle and upper) parts.
After having transferred the sample frames reliably into shared memory a second and not less important task is the real-time analysis and communication with Control. To ensure reliable separation of the tasks and to enhance maintainability of the software both tasks run as separate user processes. The only signalling between the two processes occurs via shared memory. While DAQ fills one big shared segment with SIO frames in a continuos stream, a cyclic data input process updates information about memory offset and state of transfer in a small control shared segment. The evaluation task waits periodically and/or polls this information 10 . If the tuning of the DMA length is done properly as described above, the corresponding update of the shared memory is performed immediately when the required data is ready for processing 11 . This approach seems to be the most 10)Possible conflicts between processes writing and reading at the same position of shared memory areas are avoided by the shared memory / multi-processor capabilities of Solaris OS. 11)In fact the procedure is not as simple as one might think. The DMA stream essentially is propelled by the external PCI device. Only when DMA finishes the internal caches of the CPU are automatically synchronized with the actual memory contents. During a running transfer the data in memory is meant to be undefined. While it is in principle possible to synchronize the CPUs periodically with a running DMA, this does not seem to be reasonable since the induced system overhead compares to the recurring DMA overhead mentioned above.
page 6 -eDoc layout -last modified: 31. August 2010 12:50:52 appropriate for keeping the synchronization latency between both tasks small, while simultaneously maintaining a low system overhead, thus enabling cutting edge input data rates.
The RT Behaviour of Shared Memory Communication
The timing between the DAQ and the RT analysis process -on the consumer side -does not depend crucially on memory synchronization issues because system architecture of today's shared memory multi-core and multi-CPU computers takes care of fast cache and memory consistency algorithms. However, the coexistence of two RT processes under one OS environment necessitates care. First, of course, the analysis process has to be prioritized and shielded against interruptions in the same way as it was described above for the DAQ process. This should not preclude, however, the possible mutual interruption for two equally important processes. Although, both processes exhibit considerably different behaviours, it is expected that they clash regularly: while the DAQ process sleeps most of the time waiting for the DMA Done interrupts, the analysis process just wakes up at the same time, which is when the DMA is finished and the memory-CPU synchronization reveals the arrived data. If the real-time criterium is in the range of milliseconds, this clash is not critical 12 since the DMA interrupt handling, as described, only takes about 50 µs. After this short delay, the analysis task may take over the CPU. However, if one requires a higher performance for the analysis process, it may be necessary to bind the analysis process explicitly to a particular CPU which is shielded from other processes and interrupts. Fig. 4 gives timing results measured on the ECE diagnostic with a two processor UltraSPARC system in real-time with and without processor binding. In the latter case, erratic delays summing up to 600 µs for some analysis loops can be recognized. These vanish if the task is bound to a reserved CPU. Secondly, one has to find an optimum method to signal the data availability from one process to the other. At the moment this is done in a brute force manner by polling the shared segment on the analysis side. As soon as new data becomes visible the data mapping from the shared segment into the calculation buffers is started. This technique may serve as a coarse approach for a proof of principle, but will require careful optimization. However, at this point of work the programming process is still going on and the best methods for synchronization, data mapping, and performing calculations on raw data are still under evaluation. It may well turn out, that they differ for the various algorithm requirements of diagnostics. With the collaboration of diagnosticians, we are presently working on a demonstration example which ultimately shall evolve to be a template or frame work for standard real-time analysis diagnostics.
RT Communication with Control
The last RT task remaining, is uploading analysis results to Control and possibly receiving physics results from others via the downlink from Control. For this functionality Control is offering to the diagnostics a beach head and frame work which introduces diagnostics into the circle of control systems. An important feature of this framework is that it serves as an interface between differently built systems as, for example, LabView based systems as presented by L. Giannone [9] , VxWorks based systems, and our Solaris and SIO based diagnostics. The details of this so called rtDiagCtrl frame work is described by W. Treutterer [10] .
Summary and Outlook
The DAQ requirements for diagnostics at ASDEX Upgrade have been analysed. It has been found that the developed modular and extensible front-end concept together with the SIO computer interface featuring TDC based time synchronization and real-time DMA transfer into computer memory will satisfy most of the identified functional requirements and performance needs up to a continuous data rate of 160 MB/s.
13
A series of data acquisition diagnostics has been realized in various configurations following this concept. Some of these routinely deliver between 500 MB/shot and 2 GB/shot. For those diagnostics which place special demands on the subsequent real-time accessibility of raw data in memory, a software concept for data analysis, data exchange with other RT diagnostics and communication with Control has been developed which is completely user space oriented and requires no operating system modifications beyond the existing Solaris real-time features. Performance measurements support the expectation, that diagnostics will, as a matter of course, soon be capable of providing Control, for example, with evaluated spatial profiles of plasma quantities in the due time of a couple of milliseconds required for feedback and MHD stabilization. In a first system configuration communication between diagnostics will be routed via Control. A more direct communication is possible if the distributed algorithm for a particular task is designed as a multi-node MPI [11] application covering all contributing diagnostic RT input and RT analysis nodes.
12)This is true even on single processor machines as long as the DAQ task has priority over the analysis task. 13) Among the not or not yet satisfied requirements are: the need for a timed trigger output (One has to use a separate TDC card with outputs for this function.), external triggering of sampling events, dynamic input parameters (e.g. gain) configuration, sampling rates above 40MHz, and many more hypothetical or future wishes. However, the system provides enough possibilities for extensions if they are urgently required. The RT DAQ standard presented here at ASDEX Upgrade will serve to replace many CAMAC channels in the future and to bring some more diagnostics closer in contact with plasma control. In the course of this refurbishment of diagnostics new modules and new features will be developed expanding the range of available options. During this period of development it can be taken for granted that new computer bus formats and higher data rates will evolve. We expect that the flexibility of the concept will permit adapting to such changes without sacrificing its principle design.
Figure captions 
