Abstract-This paper presents the integration of an acquisition and computing unit capable of acquiring and processing fast magnetic signals in real time in the control system of the tokamak à configuration variabile tokamak. All aspects of system integration and testing are reported, leading to testing of the system on plasma discharges. An example of a real-time analysis algorithm designed for detecting and classifying neoclassical tearing modes plasma instabilities is also described.
Abstract-This paper presents the integration of an acquisition and computing unit capable of acquiring and processing fast magnetic signals in real time in the control system of the tokamak à configuration variabile tokamak. All aspects of system integration and testing are reported, leading to testing of the system on plasma discharges. An example of a real-time analysis algorithm designed for detecting and classifying neoclassical tearing modes plasma instabilities is also described.
Index Terms-Automatic detection of tearing modes in fusion plasmas, real-time signal analysis systems, real-time tokamak control systems.
I. INTRODUCTION
T HE analysis of fast magnetic signals measured by magnetic probes installed on a tokamak vessel has been widely employed to detect and analyze a number of plasma instabilities, notably rotating tearing modes. The high frequency components of the measured signals contain information directly related to rapid changes within the plasma, for instance those generated by tearing mode magnetic islands rotating in the torus. Off-line analysis of these signals employs powerful mathematical tools, such as spectrograms, principal component analysis, and amplitude-phase fitting algorithms [1] - [4] . These provide copious information on magnetic islands topology, size, and evolution.
In this paper, we describe the integration of these techniques into the tokamak à configuration variabile (TCV) digital real time control system to execute advanced magnetic analysis codes during the tokamak discharge. The final goal is to provide plasma health status information to decision making algorithms that can initiate actuator reactions. Severe design constraints, posed by TCV's real-time environment, had to be addressed. TCV presently employs a distributed real time control system capable of controlling the entire plant during a discharge, consisting of commercial PCs interconnected with a self-controlled high speed fiber optic digital link [5] - [7] . Each node of the system can house analog to digital converter (ADC) and digital to analog converter (DAC) boards that interact with the plant. This paper describes the integration of a supplementary real-time magnetic coil analysis node into this system, taking into consideration the restrictions posed by the demanding real-time environment while respecting all the preexistent features of the control system. It is organized as follows. Section II introduces the TCV distributed control system, Section III describes the integration of the new node, and finally, Section IV presents system tests during TCV operation.
II. TCV DISTRIBUTED DIGITAL CONTROL SYSTEM
The TCV tokamak has a fully functional digital real time control system capable of controlling almost all aspects of a plasma discharge. The system is based on a real-time sharing data network (reflective memory) of modular computer nodes, each embedded or desktop PC, which may include local ADC and/or DAC cards. Owing to the restricted resources, particularly in terms of manpower, available to the TCV team, design choices were made favoring simplicity, flexibility, and maintainability. The main requirements and resulting design choices are listed in Table I . From the top level, the system is a distributed acquisition and processing real-time system build on top of standard COTS PCs augmented with ADC and DAC boards and high speed dedicated communication links. Due to this decentralized architecture throughout the plant, the French name of the system is Système de Contôrle Distribué (SCD), which translates into distributed control system. Fig. 1 shows the SCD control system layout with the connectivity to the diagnostics and actuators. Some nodes are connected to a compact-PCI (cPCI) crate hosting one or more D-tacq ACQ-196 acquisition cards with 96 ADCs, and output cards housing 16 or 32 DACs. Some nodes are only connected via the reflective memory network and act as computational nodes. At present, there are seven nodes, the seventh is the one described in this paper.
A. Real-Time Computer Nodes
All the nodes exchange data with all the others exploiting the reflective memory link (reflective memory (RFM) in brief), an industry standard high-speed digital communication system that transparently synchronizes a shared memory to all nodes [8] . The synchronization is performed without user and/or kernel intervention but it has the drawback that no data handshaking is performed so the user is responsible of preventing read-write race conditions. In SCD, this is accomplished by using a node as an RFM master, which synchronizes all operations on the RFM of all others nodes with respect to 0018-9499 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. a common timebase. On each clock of the RFM master node (this is assigned by the SCD operator), to prevent read/write collisions, we alternate a read or write from/to the RFM using direct memory access (i.e., write on odd cycle and then read on even cycle). All nodes must be synchronized with the write and read cycles, and all the nodes write (or read) at the same time, although nodes may skip cycles depending on their control rates [7] . The nodes with ADCs and DACs are interfaced to machine's diagnostics systems and to machine's actuators. Node 1 is interfaced to 2 soft-X diagnostics (duplex multiwire proportional soft X-ray counter, a pinhole type soft-X camera and X-Te, and a four filter soft-X spectrometer that provides central electron temperature using the differential filter method). It is also interfaced to the 14 vertical chords of the far infrared (FIR) interferometer providing the electron density profile information. Node 2 acquires all magnetics measurements from the tokamak and is thus responsible for plasma shape and position control; it also acquires the central FIR channel for real-time control of the density. This node is used routinely as the main plasma position and density controller and it is almost always the RFM master node. Node 3 is a computational node that computes plasma magnetic equilibrium in real time. Node 4 is a replacement node for node 2, while node 5 is an acquisition and processing node connected to the 200 channel soft X-ray tomographic system. Node 6 is a very recently installed multicore computational node that has been used to run multicore complex control codes (a faster real time equilibrium reconstruction replica and RAPTOR-based advanced plasma performances controllers). Finally, node 7 is devoted to real-time analysis of fast magnetic perturbations in the plasma. Fig. 2 presents the modular, portable software organization of the SCD control system. The SCD code is divided into two main sections.
B. Software Organization
1) Hardware interface code written in C/C++ language by the system developers, and it provides the input/output interface to the control algorithm code. Once compiled, the executable is uploaded to real-time computer nodes, where it is considered fixed and unchanging between plasma shots. However, the hardware code can change its functional behavior depending on external configuration parameters, such as a varying operational mode with or without suspending interrupts, the number of ADC/DAC cards, the number of cores to be used (for multicores nodes), the data shared via RFM memory, and so on. 2) Control algorithm code realized in MathWorksSimulink [9] block programming language by the control algorithm author(s), using Simulink templates given by the system developers. It performs signal processing and computational actions to provide output signals consisting of new values for the actuators, the reflective memory, and other signals (probe signals) used for postshot analyses. This algorithm is in userfriendly Simulink block format and is automatically converted into target code that is a dynamically linked shared object library by MathWorks Simulink Embedded Coder (SEC) [10] . This modular architecture offers great advantages. The abstraction from the hardware specific code makes the control algorithm code portable, i.e., control algorithms tested on TCV could be readily reused in other fusion facilities with minor adaptations. Another salient advantage of this modularity is its flexibility, i.e., the control algorithm can be developed on any computer equipped with MATLAB-Simulink without requiring the SEC package installed, and then simulated for debugging using the real diagnostic signals of past shots. This feature has already been exploited by external collaborators from several institutions across Europe and beyond to test their control algorithms on TCV. Only once the correctness and the robustness of the control algorithm are verified in simulation using experimental data of the previous shots, is it commissioned into the RT system. Another essential advantage is that algorithm authors can use the extensive Simulink block library for standard components, such as filters, integrators, matrix multiplications, and other advanced signal processing tools. For TCV, standard custom-built blocks are provided for interfacing algorithms with TCV diagnostics and actuators, implementing calibration factors, signal selection, actuator constraints and saturation, and so on.
III. INTEGRATION OF NODE 7
Having introduced the general architecture of TCV SCD control system, now we explain the integration of the new node 7.
A. Hardware and the Hardware Interface Code
From the point of view of the hardware architecture, node 7 is similar to all other SCD nodes, being constituted of a cPCI acquisition crate fitted with a DTACQ ACQ196 96 channels synchronous digitizer card, a rear transition module (DTACQ RTM type T), and an industrial PC, acting as the host processing unit. The industrial PC is fitted with a reflective memory card of the same kind as all the other nodes of the system to communicate with real time network.
As mentioned, node 7 is devoted to fast magnetic signal acquisition and processing in real time. The main technical requirements to be fulfilled are as follows.
1) Sampling Frequency: The frequency of the usually observed plasma instabilities of the TCV tokamak lies in the range from some hundreds of hertz to 100 kHz. Furthermore, phase coherence-based algorithms are often used to infer coherent plasma structures and so simultaneous sampling of signals is mandatory. Simultaneous sampling of at least 32 magnetic signals at frequencies above 200 ksps is required. 2) CPU Process Time: The previous requirement limits the CPU processing time since higher sampling rates lower the available processing time on the CPU (being equal to the sampling period). The aim of this node is to execute complex analysis algorithms on high speed multichannel data streams within a reasonable processing time. 3) Multiprocessing Capability: Processing algorithms may be spread over multiple cores to increase the computational power. 4) Legacy RFM Interface: Data sharing with other nodes of the control system has to comply with the existing data distribution scheme described in Section II-A.
5) Processing Algorithms Written in Mathworks/Simulink:
The analysis algorithms should be developed in MAT-LAB/Simulink and automatically uploaded to the analysis node. Therefore, node 7 must be fully compatible with the software architecture described in Fig. 2 . To comply with the above-mentioned requirements, we adopted a packet acquisition-processing scheme, where the ADCs have a double buffered data path allowing the CPUs to operate on each data packet, as described in [11] . The acquisition hardware employs an ADC subsystem with a D-TACQ ADC196 board augmented with an RTM-T rear transition module [12] . This module provides a high speed PCI-Express 1x link to the host PC, which is equipped with a recent, high-end, motherboard hosting an Intel i7-5960X 8-core 3-GHz processor. The adopted kernel is Scientific Linux 6.7 especially configured for user-mode RT capabilities with a dedicated hardware interface code fully compliant with Fig. 2 . The ad hoc configured kernel together with hardware interface code exploits the multicore CPU by distributing tasks on the cores, as summarized in Table II. The kernel is restricted to the first core by Linux init command options flexibility with all main kernel threads and user processes executed on this core leaving the others free for user real-time processes and/or threads; the only kernel threads remaining on these cores are the basic Linux kernel threads required for handling the cores (such as the interrupt handling kthread, the migration kthread, and the watchdog kthread).
Once the system enters the real-time state before the plasma discharge, and the hardware interface code is launched as a user process on core 2. Node 7's hardware interface code is a multithread process that spawns a thread per used core to distribute the computational tasks. Besides hosting the hardware interface code, core 2 receives data-ready interrupts from the acquisition subsystem (which, thus, also synchronizes the node), distributes fresh ADC data to the processing threads, and prepares RFM read/write buffer (without actually moving them on the RFM board). Cores 3 to 6 each can host a processing Simulink algorithm, making this node fully multiprocessing. Finally, core 7 is used to synchronize data transmission and reception on the RFM card; we employ a dedicated thread on a dedicated core to run this task, since the node must resynchronize itself to the master RFM node's read and write clocks and the fastest way to achieve this is by polling the RFM card. Since a polling thread usually consumes all the core activity rates and can only be preempted by a higher priority thread (on an RT kernel), we chose to employ a separated core and thread to do this, leaving core 2 fully committed to real time ADC data handling. Fig. 3 presents the typical working time chart of node 7. The ADCs on the ACQ196 board are clocked at 256-kHz synchronous with TCV main clock at 1 MHz; the synchronous sampling clock is generated by a high accuracy direct digital synthesizer chip on a companion board in the acquisition cPCI crate. ADCs' data stream is directed to a double buffer on the acquisition board, when one buffer is full, a data ready interrupt is sent to the host PC and the buffer is switched, releasing the just filled one to the host CPU. The arrival rate of this interrupt is evaluated as follows:
where f int is the interrupt frequency, f s is the sampling frequency, and N sch is the number of samples per channel in the buffer. With typical working parameters: f s = 256000 and N sch = 256, the interrupt frequency is f int = 1000, i.e., we have a 1-ms cycle time. With this hardware, N sch must be a power of two, hence the choice of 256 kHz for the sampling frequency to obtain a 1-ms cycle time.
The hardware interface code exploits Intel time stamp counter [13] technology to precisely time stamp its activities. Every core has a 64-bits wide counter clocked at the CPU nominal frequency that can be read very quickly using dedicated assembler code. These counters can be exploited to time stamp key passages of the hardware interface code, providing the timing information for system monitoring and algorithm benchmarking. The initial timing information is the measured system cycle time, i.e., time interval t1 in Fig. 3 . The time trace of this value is the first indicator of good or improper RT behavior of the system; with the working parameters introduced in the previous paragraph, this trace should be a constant value of 1 ms. Once the data ready interrupt has been issued, the CPU must copy fresh ADC data from the acquisition board to main memory, distribute it to the processing cores, and update the RFM buffers. This requires time that is stored in time stamp t2 and is depicted as the purple phase in Fig. 3 . After this phase, the processing cores have new data to process and thus processing algorithms can be triggered; several processing cores act on ADC data in parallel, as depicted by the "CPU 1 processing" and "CPU 2 proc" phases in Fig. 3 , and their processing time is stored in markers t5 and t6. Up to four processing cores can be used: the algorithm presented in this paper uses two. The available computational time to all processing cores in every cycle is equal to t1-t2. The hardware interface code can allow some processing cores to work at a lower speed with respect to the main ADC data ready by triggering at a fraction 1/N of the rate of data ready interrupt arrival; this allows these cores to operate at lower speed with respect to data packet rate, granting more computational time at the expense of analyzing only a subset of the signals windows, i.e., one every N. In this case, data exchange with other cores happens every N computing cycle as well. Following the t2 phase, the hardware interface code exchanges data with the reflective memory board. It must first synchronize itself with the read and write phases dictated by the RFM master node. Time elapsed from the data ready interrupt and the read and write synch time are recorded in time stamps t3 and t4 of Fig. 3 .
All timing informations are stored to the SCD MDSplus tree and are used to check the node functionality.
B. Software, the Simulink Model, and Integration of the MHD Analysis Code
As all the other nodes of the control system, the control code (or diagnostic processing code, in this case) is written as a Simulink block diagram. This enormously facilitates algorithms' management during both experimental activity and later data analysis. Since node 7 is a multicore processing machine, a dedicated Simulink block model was developed. This model is shown in Fig. 4 . Every processing core is modeled as Simulink block with standardized input and output ports; since node 7 has four parallel processing cores, four of these blocks are inserted. Each block represents an independent processing thread that can process input data in parallel with respect to the others.
Once one TCV pulse is initiated, the active blocks' content is opened by the SCD build process, a Linux shared object library is generated for each, and these libraries are distributed to node 7. Upon entering the real-time phase of the discharge, the hardware interface code of node 7 is launched and loads these shared objects assigning their processing code to one thread per core. During the discharge, these cores process data from the fast magnetic signals in parallel and return the results via the RFM real time network, as described in Section III-A. Fig. 5 shows a Simulink block representing a single core. This is the Simulink object that is then translated into C code and then compiled into the Linux shared library upon shot preparation. It has four input and five output ports, whose meaning and sizes are summarized in Table III . This block acts as a wrapper for every processing algorithm, a Simulink model inserted and it also provides a clear and straightforward mean of standardizing the input and output interfaces. Fig. 4 shows the actual interconnections of the core models on the node CPU. Basically, every core's proc_out port is looped back to the proc_in port of the other cores with one cycle delay. This provides a simple and general modeling of interthread data communication. The hardware interface code takes care of these data movements once the algorithms' models are uploaded to node 7. Obviously, this is not the most efficient way to handle intercore data communications, since a fixed amount of data (figures are in Table III) are always exchanged irrespective of what is really present on the proc_out ports. Nevertheless, we chose a general, fixed interface, approach at the expense of communication overhead. This is also true for all the input and output ports.
Input and output ports to and from the CPU are arranged with the twofold aim of mimicking the real system (particularly the hardware interface code) allowing SCD resimulation on a past data in Simulink. As an example, the ADC input port is connected to an InputFromWorkspace block that is initialized with previous shot ADC data, taken from the SCD MDSplus database if the model is used for resimulation. It is connected to the real ADCs by the hardware interface code, when the core model is uploaded to node 7. The same happens for the other ports, allowing an efficient and easily manageable resimulation of past shots while providing a very flexible processing algorithm development environment.
Node 7 was programmed and tested with the SVD rt MHD analysis code described in [14] . In essence, this code employs a singular value decomposition of a matrix, whose columns are a bandpass filtered copy of the fast magnetic signals followed by a postprocessing phase that compares the experimental principal axes (i.e., the columns of one of the three matrices computed by the SV decomposition) with those computed on a numerically generated set of signals from a theoretical model of rotating modes. This algorithm was ported to the Simulink representation introduced here. It exploits the multiprocessing capability of node 7 operating on to two cores: the first computes the SVD decomposition of live signals from the ADCs and the second computes the SVD decomposition of the theoretical model. Not only does this reduce the computational time but the theoretical model can be made dependable on other live plasma signals. In the presented implementation, it depends upon the plasma magnetic axis in the machine. Conventional oddN and evenN MHD markers have also been included in the analysis code. Fig. 6 reports the algorithm that runs on core 1. The algorithm that runs on core 2 is omitted for brevity. This Simulink block is contained in the wrapper of Fig. 5 . In the algorithm model, we followed the usual Simulink convention with the signals flow from left to right. So, the ADC input port is placed at top-left in the figure and supplies all acquired 64 channels, each having 256 samples. This data buffer is decomposed by the pretreatment stage, whose tasks are to extract 13 channels (12 for the SVD plus 1 for one of the conventional markers), to apply the inverse sensor-ADC transfer function to compensate for the analog chain, 2x digital downsample the signals (applying a low-pass filter and a 2x decimator) and to apply a high-pass filter. Pretreated channels are prepared by the rms normalization stage before being fed to the actual SVD factorization and postprocessing algorithm (currently implemented as a Simulink S-function C++ block). The outputs of this block are passed to two output ports of the core: the RFM port to distribute the results to other nodes and the memory port to store the results on the MDSplus database for later analysis. The SVD factorization block needs the theoretical principal axes computed by the companion algorithm that runs on core 2. These data enter the block through the proc_in port at the bottom-left of the figure, and are extracted by a signal extraction block and connected to the SVD block.
Six pretreated signals are employed to compute three pairs of conventional oddN and evenN markers. These signals come from three couples of fast magnetic sensors located each 180°a part on the equatorial low field side plane on the machine (each couple at a different absolute toroidal angle). From their location, summing and subtracting their signals and cascading an rms average stage (and possibly low-pass filtering) provides information on the presence and amplitude of modes with even and odd toroidal mode number (N) separately. This algorithm was installed and is depicted at the bottom-right of Fig. 6 .
IV. TESTS ON THE TCV PLASMA Node 7 has been tested on real TCV plasmas during the most recent campaign with the MHD analysis code described in Section III-B. The node performed well both in terms of real time performance and effectiveness in providing on line MHD activity markers. Fig. 7 shows real time traces measured by the hardware interface code on node 7 during a discharge. The abscissa axis is shot time, starts at t = −0.5 s and ends at t = 2.0 s (breakdown is intended at t = 0.0 s). Timings are arranged by the MDSplus archiving routines of node 7 in order to facilitate algorithms' computational time compliance check by the system operator. In particular, timings t2-t4 are plotted "upside-down" from the cycle time (t1) track, whereas cores' computational time is plotted normally. In this way, it is simple to check when a core is not able to keep-up with the RT requirements, since the available computational time is quickly identified [see Fig. 7 (right) ].
The node was parametrized with a 1-ms nominal cycle time and the measured one exactly matches with only some µs of jitter (t1 in Fig. 7 ). With two active processing cores, data distribution takes approximately 80 µs (t2 interval, in Fig. 7 ). The two processing cores execute the two algorithms in parallel and respect the limit on available computational time, as depicted by t5 and t6 time intervals in the figure. Finally, reflective memory read and write synch timestamps are recorded (t3 and t4 time intervals in the figure), proving that the 10-kHz RFM strobe synch from RFM master node (node 2 in this shot) was correctly captured, and data were correctly distributed to and from the real time network. Fig. 8 shows some MHD analysis results. The abscissa axis is identical to Fig. 7 . The first plot shows node timings, as in Fig. 7 , while the second plot shows raw ADC channels of magnetic signals from TCV. The third plot shows normalized SVD-based neoclassical tearing modes presence markers, as described in [14] . Briefly, the H marker is the entropy of the set of normalized singular values and is thus related to the degree of phase coherence shown by all the magnetic signals. The more it is near 0 the more we have a phase coherent signal in all the sensors and so one or more rotating modes. The P1 and P2 markers are the relative squared magnitude of the first and second couples of singular values with respect to all the others; their behavior is opposite with respect to the H marker and they are linked to the presence of only one (P1) or two (P2) rotating modes at the same time. The fourth graph displays the frequency of the rotating mode, if any, and the fifth the (2, 1), (3, 1) , and (3, 2) modes likelihood markers. The last plot shows the standard MHD markers oddN and evenN. The interested reader is pointed to [14] for further information on this algorithm.
Looking at these signals, this shot is characterized by a quick (3, 1) rotating mode just after the breakdown and by a (2, 1) mode in the interval t = [1.1, 1.4] and again in the interval t = [1.8, 2.2]. Since these signals are distributed on the real time network, they can be used to trigger countermeasures to control the MHD activity (for instance by means of ECR heating) or, more simply, to soft land the discharge without incurring a full high current disruption. For example, in the presented shot, electron cyclotron resonance heating beams were switched ON triggered by the (2, 1) likelihood signal and effectively compensated the rotating mode at t = 1.4 and at t = 2.2.
V. CONCLUSION AND OUTLOOK
We have described the integration of a new node into the TCV real time control system. The new node is tailored to performing advanced analysis algorithm on the fast magnetic signals of TCV in real time and to distribute the results to the real time network of the control system. During the development of the system, we succeeded in respecting all useful features of the legacy system, namely: a strict separation between the environment in which algorithms are developed (Mathworks/Simulink) and that in which they are executed (custom C/C++ code) and compatibility with the legacy data interface on the RFM network, and so on. New features have also been introduced: a real-time packet acquisition and processing approach, multicore data processing, and data transfers on a multi synchronous system (ADC side and RFM side).
We think that the approach followed in this paper can be continued in the future whenever there is the need for the real-time processing of fast diagnostics with complex algorithms and to distribute results to other actors in the control system. One example on our system is the soft-x XTOMO acquisition node no. 5, which could be refurbished with an approach like that described here.
