Abstract-Linear accelerators driving Free Electron Lasers (FELs), such as Free Electron ·Laser in Hamburg (FLASH) or X-ray Free Electron Laser (XFEL), require a sophisticated Low Level Radio Frequency (LLRF) control system. The controller of the LLRF system should stabilize the phase and amplitude of the field in accelerating modules below 0.02 % of amplitude and 0.01 degree for phase tolerances to produce ultra stable electron beam that meets required conditions for Self-Amplified Spontaneous Emission (SASE). Since the LLRF system for XFEL must work for the next 20 years, it should be reliable, reproducible and upgradeable. Having in mind all the requirements of LLRF control system, the Advanced Telecommunications Computing Architecture (ATCA) was chosen to build a prototype LLRF system for FLASH accelerator that is able to supervise 32 cavities of one accelerating section. The LLRF controUer takes advantage of the features offered by the ATCA standard. The LLRF system consists of a few ATCA carrier blades, Rear Transition Modules (RTM) and several Advanced Mezzanine Cards (AMCs) that provide aU necessary digital and analogue hardware components. The distributed hardware of the LLRF system requires a number of communication links that should provide different latencies, bandwidths and protocols. The paper presents the general view of the ATCA-based LLRF system, discusses the requirements and proposes application for various interfaces and protocols in the distributed LLRF control system.
I. INTRODUCTION
A powerful digital Low Level Radio Frequency (LLRF) system is required to fulfill heavy demands of the control system that allows for a reliable operation of the linear accelerators, such as FLASH or XFEL [1] . A real-time soft controller with a digital fast feedback and adaptive feedforward is used to stabilize a cavity field in accelerating modules [2] . A block diagram of the LLRF controller, used in the FLASH accelerator is presented in Fig. 1 . The LLRF controller submodule measures an electric field in cavities (probe, forward and reflected power) and generates a complex control signal (Imaginary and Quadrature component) that is used to modulate a reference signal from the Master Oscillator (MO). The control signal, amplified in klystron, is distributed to cavities through a wave-guide system. The accelerating cavities are supplied with 1.3 GHz signal produced by 5 MW klystrons. Digital data processing is performed in the control system and therefore the driving signal for the klystron is produced.
D. Makowski, A. Piotrowski, G. Jablonski, W. Jalmuzna are with the Technical University of L6dz Department of Microelectronics and Computer Science 93-590 L6dz, Poland (e-mail: dmakow@dmcs.p.lodz.pl).
W. Koprek, T. Jezynski, S. Simrock are with the Deutsche ElektronenSynchrotron (DESY).
The LLRF system, that is currently installed in FLASH, allows to control one cryogenic module comprised of eight superconductive TESLA cavities [3] . The LLRF system of FLASH is based on the VME (Versa Module Eurocard) architecture [4] , [5] . Since the XFEL accelerator needs almost 1000 cavities, the LLRF system will consist of more than 32 RF stations. Therefore, it is desired to design a LLRF system able to supervise 32 cavities; one RF station comprised of four cryo-modules spread out over 50 m [6] . The LLRF controller supervising 32 cavities is connected with other accelerator components with a significant number of analogue and digital signals, e.g. 96 analogue cavity signals, 32 analogue and 32 digital signals for fast and slow piezo tuners, 10 digital interlock signals, and so on [7] .
In addition to the operational demands the system should be characterized with high reliability, availability and modular design. The ATCA standard offers a significant number of features that can be helpful to design a highly reliable LLRF system [8] - [ 10] . The ATCA and AMC standards offer standardized modular design. The standards offers hot-swap functionality, therefore the modules can be removed from the running system without an inconvenient system power down. Moreover, the ATCA standard supports redundancy for the most critical subsystems, like: power supply, management, diagnostics and communication links. On the other hand, the number of required connections between ATCA and AMC subsystems must be increased to derive benefits from the standards. The standards make use of high speed serial interfaces available on the ATCA backplane that allow transferring data with extremely high throughput in range up to 80 Gbps.
II. ATCA-BASED LLRF CONTROL SYSTEM
The ATCA standard offers many important features that could help to enhance the availability of the LLRF controller:
• Modular design, • Supervision and monitoring of power supply, • Redundancy of important submodules and connections.
Since a significant number of analogue signals is connected to the LLRF controller, it is required to design a distributed controller, composed of a few ATCA blades. ATCA carrier boards and AMC modules are used to maintain modularity and upgradeability [9] - [11] . The complex architecture of the controller requires communication over a significant number of digital and analogue links. The block diagram of the distributed controller with emphasized connections between modules is presented in Fig. 2 . The LLRF system is composed The PICMG 3.0 specification defines an ATCA backplane that contains a number of general purpose serial links. The ATCA backplane is divided into three zones. Zone 1 is dedicated for a dual, redundant power supply bus and the redundant 1 2 C interface for management and supervision. Zone 2 provides data transport interfaces that support: base, fabric, update channel interfaces and synchronization clock interface. The Zone 2 data transmission channels support Dual Star, Dual-Dual Star or Full Mesh topologies. The base channel interface, comprised of four signal pairs, is dedicated for 10/1OO/1000BASE-T Ethernet interface. The subsidiary PICMG 3.x standards define communication protocols that can be implemented in the Fabric Interface, e.g. Gigabit Ethernet, InfiniBand, PCI express (PCle) and StarFabric [10] , [13] . They employ four differential lanes for a single communication channel. These interfaces allow to obtain huge data throughput (in range of Gb per one serial link). The space available in The PCIe-to-Gigabit Ethernet bridge is a server application, working under control of Linux operating system, designed to perfonn communication between external programs connected to the network and PCIe devices transparent for users. Several low-level drivers were designed to exchange data between bridge application and hardware components connected to the PCIe bus. The drivers are responsible for data fonnatting, interrupts handling and reading from or writing data to memory locations. An appropriate procedures were implemented for various types of devices. The communication between external client applications and the server bridge is performed using a High Level Application PCle library (libhlapcie). The library is responsible for data encapsulation, device or register addresses mapping and data format conversion. At the beginning of data transmission, the client application sets the address of the requested PCle register and, if possible, establishes communication. The correct address contains three elements: name of the ATCA carrier board, the name of the requested device and the name of the register in the PCle namespace. The communication library converts the name of the ATCA carrier board to IP address and the name of the register to the offset relative to the PCle base address. A transmission frame is sent to the bridge application. An additional communication protocol was introduced to unify data frame format and addressing convention. The PCle-to-Gigabit Ethernet bridge, based on the contents of the received frame, exchanges information with an appropriate device and sends answers back. The High Level Application library together with the bridge server application constitutes intermediate level of the PCle communication software subsystem, that can be used with applications like DOOCS servers, Matlab scripts or C/C++ standalone programs i.e. to visualize or control behavior of the hardware devices connected to the PCle bus.
Feed-Forward
A RocketlO bus with a full-mesh topology is used for data transmission that requires low latency, e.g. transmission of the partial vector sum from Data Acquisition Boards (ATCA carriers # 1, #3 and #4), see Fig. 2 . The same standard was used for data transmission from AMC modules to the data processing FPGA. All analogue signals are delivered to AMC modules via Zone 3 connector. Two additional cross-switches compatible with both PCle and RocketlO standards were used to dynamically connect these signals to required backplane transmission channels. The dynamic configuration allows to install the ATCA carriers in any ATCA slots.
The optional Gigabit Ethernet link, presented with a dashed line in Fig. 2 , is available on the Base Interface. All ATCA boards are connected to the ATS1936 Ethernet switch [12] . The interface has no assigned functionality, however may be used for a transmission of diagnostic or control data. Moreover, a complex Gibabit Ethernet core for FPGA is required for data transmission. The Ethernet core consumes a significant amount of resources of Xilinx Virtex 5 family chip. The Xilinx V5 devices contain a hardware PCle endpoint, therefore this protocol is more suitable for diagnostic purposes than Gigabit Ethernet.
The ATCA standard requires a redundant Shelf Manager (ShM), Intelligent Platform Management Controllers (IPMCs) installed on ATCA carriers and Module Management Controllers (MMCs) on AMC modules for management, supervision and monitoring of electronic components installed on ATCA boards, AMC and RTM modules [15] , [16] . The ShM, IPMC and MMC are connected using Intelligent Platfonn Management Interface (lPMI) via a redundant I 2 C bus. The ATCA standard offers a redundant connection for the most critical subsystems. The good example is the IPMI bus [15] . The operator can monitor and control all subsystems via redundant 10/100 Mb Ethernet connections supported by so-called IPMI over LAN protocol, see Zone 3 is user defined and is usually used to connect the ATCA board to a Rear Transition Module (RTM). The Zone 3 area can also hold a special backplane to interconnect boards with signals that are not defined in the ATCA specification.
The PCI express available on the Fabric Interface is used for data transmission between ATCA carrier boards and AMC modules. PCle interface is used for transmission of control data. The interface uses a star topology with a single Root Complex. All PCIe devices are available in the same memory space. Therefore, an eight-port PCI express switch PEX8532 is present on every ATCA carried board to connect three AMC modules, FPGA (Field Programmable Gate Array) with the Fabric connector (two channels PCle xl). An additional PCIe x4 connector is reserved for the Root Complex required by PCle standard (next version of the carrier board will be equipped with the RC built with the Freescale PowerPC procesor MPC8548). The RC is connected to the ATCA carrier #2 using a dedicated PCle cable [14] . However, the PCIe interface is not available outside ATCA crate. Therefore, a software PCle-to-Gigabit Ethernet bridge running on the Root Complex board was designed to give the access to the PCIe bus from external applications. A hierarchical structure of layers required for communication with applications running on an external computer is presented in Fig. 3 . Fig. 4 . The relationship of the real-time algorithms with a single RF pulse provides a redundant power supply bus to enhance reliability of the whole system [10] . A suitable warning may be generated when the failure of one power supply unit is detected. Every ATCA carrier board is equipped with a diagnostic interface compatible with EIA RS 232 standard.
The ATCA-based controller is connected with other devices installed outside the ATCA shelf with optical cables. The piezo compensation unit, that contains piezodrivers and patchpanel for piezoelements installed in cavity modules, is connected with an AMC module using optical fiber. The piezo AMC module communicates with the main processing unit using PCle interface, see Fig. 2 .
III. DATA TRANSMISSION CHANNELS
The FLASH accelerator works in a pulse mode at the repetition rate between 1 and 10Hz, where one RF pulse lasts approximately 2 ms [17] . Therefore, the designed LLRF control system requires three various types of communication links:
• Real-time, intra-pulse links;
• Real-time, inter-pulse links; • Nonreal-time links. This classification comes from different types of control algorithms implemented in the LLRF system. The relationship of the real-time algorithms with a single RF is depicted in Fig. 4 .
The first category is real-time, intra-pulse links that are required for data transmission in the fast feedback loop or the interlock system. The latency for the intra-pulse links should be below 150 ns. The second category is real-time, inter-pulse links where data should be transmitted between two subsequent RF pulses. The third category includes non-realtime links which provide data transport between slow parts of the LLRF systems and other accelerator modules within time in range of a few RF-pulses. The analysis of available documentation shows that the ATCA standard is able to fulfill all requirements.
A large number of control algorithms running in the ATCAbased LLRF control system requires a significant computation power and several different communication links for one RF station. Due to a large number of analogue signals and different types of control algorithms it is not possible to implement the LLRF controller on a single ATCA carrier blade. In the presented system, all control algorithms are distributed among four carrier blades (ATCA #1-#4) and an optional CPU blade located in the same ATCA shelf. The distributed algorithms require a number of communication links with different latencies and throughputs according to the defined classification.
All ATCA boards collect data from analogue channels, calculate partial vector sum and send the data to the main controller (ATCA #2). The controller, implemented in FPGA device, receives data using Low Latency Link (LLL) and generates output signals that are connected to Vector Modulator. The analogue signals connected to AMC modules with Analogue-to-Digital-Converters (ADCs) are digitized and sent to FPGA using differential signal [18] . The partial vector sum calculated in FPGA is transferred to controller carrier blade ATCA #2. Therefore, a custom, Low Latency Link protocol is required for transmitting of critical data, especially in the fast feedback loop, see Fig. 1 For the intra-pulse operation a Full Mesh topology is involved. The intra-pulse communication channels do not require high throughput. The latency is more critical for these interfaces and should be lower than 150 ns. In order to meet this highly demanding latency requirement Xilinx Virtex 5 FPGA chip is used on the ATCA carrier blade. It contains RocketIO macro cells which support serial communication with latency in range of 100 ns.
During the RF pulse a large number of digital data from several ADCs should be stored in memories for further processing. For the presented system the storage memory is available on AMC modules equipped with ADCs. The stored data are transported from all AMC modules to different carrier and CPU blades after the pulse. These data are used by different control algorithms implemented in FPGA chips and the CPU blade. Since the final allocation of control algorithms is not decided yet, the computation load of particular carrier blades is also not known. It was assumed that each carrier blade may require data from any other blade in the ATCA shelf. Since that latency of the transferred data is not so critical for the inter-pulse links and could be higher than a few~s, the PCle bus can be used. The throughput offered by a single lane PCle interface is enough for the inter-pulse transmission. The PCI express was chosen as a main protocol for transmission of digital data between ATCA carrier blades and AMC modules within one ATCA shelf. The PICMG 3.0 standard allows obtaining peer-to-peer connections. Therefore, a PCle switch PEX 8532 is applied on each carried board to realize daisy chain topology on the Fabric Interface with the latency in range of a few~s. The PCle protocol is used for realtime, inter-pulse data transmission such as control signals, data acquisition, diagnostics signals and piezo detuning calculation. The PCle standard requires a Root Complex (RC) for bus management and configuration. The designed carrier board allows to communicate with other ATCA blades installed in the same ATCA shelf using available PCle switches. Of The last type of the nonreal-time communication, with latencies in range of milliseconds is supported by Ethernet links in the ATCA shelf and the PCle-to-Gigabit Ethernet bridge. As was already mentioned the ATCA-based LLRF system contains an ATCA Ethernet switch located in the slot 1 of the ATCA shelf (redundant switch may be installed in the slot 2). The PICMG 3.0 specification defines obligatory Ethernet links on Base Interface located on Zone 2. The Base Interface allows implementation of 1OOOBASE-T Ethernet links concentrated in the Ethernet switch. Physically the Ethernet links form a start topology with a central Ethernet switch. However, it is possible to achieve parallel communication between several slots in the ATCA shelf if the Ethernet switch implements socalled non-blocking switching. The Ethernet communication is dedicated for slow algorithms and currently the star topology on the Base Interface seems to be sufficient for the planned control algorithms.
However, the current version of the ATCA carrier blade is not equipped with the RC required for PCle interface, therefore it needs additional processor in the ATCA system to serve the RC functionality. There are several options of RC implementation in the ATCA system. The RC may be installed on one of the commercially available AMC modules equipped with a processor with PCle interface. Such an AMC module should be installed on one of the carrier blades in the AMC bay since each of the AMC bay is connected to the PCle switch located on the blade. The disadvantage of this solution is that, the AMC module occupies one of the limited number of AMC bays and failure of the particular carrier blade will disturb PCle communication in the entire ATCA shelf. The second option is to use a commercial ATCA CPU blade which has PCle links wired from the on-board CPU to the Fabric Interface. In this case the configuration of the PCle communication in the ATCA shelf does not depend on the presence of the carrier blade. Unfortunately, there is very limited number of commercial ATCA CPU blades which use PCle links on the Fabric Interface -one such CPU blade is ADLink CPU-6900 [19] . The third option is to use a PCle cable for connection with an external processor working as a RC. This option seems to be inconvenient due to involvement of the external computer and its reliability, however can be acceptable during the development of the system. iclk in std_logic;
ii -resetN in std logic; ii addr in std=logic_vector(31 downto 0) ; ii_writeN in std logic;
ii -data -in in std=logic_vector(31 downto 0) ;
ii -data out: out std logic vector(31 downto 0) ;
ii -strobeN : in std-logicĩ i -ackN out std=logic; ii irqN out std logic; ii=irCLackN: in std=logic; 
A. FPGA Control Interfaces
Low-latency parts of the control algorithms are implemented in the FPGA fabric. An important feature of the design is the simple access to the various registers located there, regardless whether the external interface is VME, USB, PCI or PCle. This problem is solved using an Integral Interface (II) [20] . The Integral Interface is a set of VHDL functions and code generator programs, allowing to easily access a set of registers and memory blocks in an FPGA code. An interface to the user code is a simple bus interface, presented in Fig. 5 using the VHDL syntax. Internally the registers and memory blocks are visible as the elements of two structures, s for scalar elements and v for memory blocks (vectors), so accessing them is very straightforward.
As the user logic interface is not directly compatible with the PCle bus, a special PCle-to-lntegral Interface bridge has been developed. The system uses the Xilinx Virtex 5 FPGAs, containing the built-in hardware PCle endpoint blocks, which handles the PCle configuration space requests and allows to communicate with the PCle bus via the transaction layer packets. The developed bridge handles the conversion between the transaction layer and the user logic interface. The endpoint block transaction layer can work with a set of fixed frequencies -62.5 MHz, 125 MHz or 250 MHz. If the user logic requires operation at a different frequency, e.g. enforced by the ADC sampling rate, then an additional synchronizer block is required (see Fig. 7 ).
IV. EXPERIMENTAL RESULTS
Several experiments have been carried out to evaluate a performance of the proposed architecture. The application that has been tested practically using the ATCA architecture was the compensation of the Lorenz force detuning in 24 cavities simultaneously using the piezoelements mounted in the ACC3, ACC5 and ACC6 modules of the FLASH acceleratorthe pulse parameters have been determined by the operator, in the next step an automation of this process will be implemented. The latency of the direct data transmission via PCIe bus (e.g. the write operation) has turned out to be about 1 ,..,s. For read operations the latency is twice as high. These results imply, that to read large amounts of data from the algorithms implemented in the FPGA, a direct memory access must be used. It will be implemented in the future version of Integral Interface. The interrupt latency of the PowerQUICC processor under the Linux operating system from the Virtex 5 FPGA connected via the PCIe interface turned out to be of order of 1 JA,s. It allows to implement part of the algorithms, that have to operate between pulses, e.g. computation of parameters for the Lorentz force detuning compensation, on the PowerQUICC processor.
According to the V5 datasheet, the overall latency of the transmitter-receiver path for RocketIO blocks varies from 12.5 to 23 clock cycles (106.25 MHz) depending on the chosen configuration options. The output update rate for the XFEL field controller is 200 ns. Assuming pipelined operation of the controller's computation core, the presented link fulfills timing requirements for the system.
V. CONCLUSION
Currently, VME is the most popular system architecture used in high energy physics installations. This architecture, originated in late 70s, has several disadvantages which make it less and less suitable for current highly demanding control systems. The ATCA architecture eliminates most of the VME weaknesses. Redundant power supply, high bandwidth serial links and IPMI managements subsystem enhanced with the hot-swap functionality make it an excellent candidate for the future infrastructure of experimental physics. The complexity of the ATCA-based systems is compensated by high reliability and availability.
The presented paper describes the ATCA-based LLRF system from the communication links point of view. The analysis of the PICMG 3.0 specification shows that variety of obligatory and optionally defined communication links fulfills demanding requirements of the control algorithms of the LLRF system. The obligatory links in the ATCA shelf provide support for redundant management and configuration subsystem.
The presented LLRF controller of FLASH does not fully utilize the capabilities of this powerful architecture. The offered throughput of ATCA serial links is much higher than required for the distributed LLRF controller composed of a few ATCA carrier blades. An important disadvantage of the standard is an inherently high latency of the serial links as compared to VME's parallel interfaces. As confirmed by our experimental results, the latency in the protocols specified by the ATCA standard is too high for implementation of distributed control algorithms. Both PCIe and Gigabit Ethernet serial interfaces used by the designed ATCA blades have the latency higher than 150 ns and therefore they are not suitable for transmitting data for the main LLRF controller. Moreover, an application of hierarchical PCIe or Ethernet switches significantly increases the latency. However, the abundance of point-topoint links available on the backplane allows to implement custom latency-optimized protocols. The application of RocketIO standard allowed to design a distributed controller based on SimCon 3.1L. The RocketIO links latencies are low enough for transferring the partial vector sum calculated by the ATCA carrier blades. The main LLRF controller can be implemented in one of the ATCA carriers equipped with powerful Xilinx FPGA.
The IPMI standard offers diagnostic capability that is usually required in complex control systems. It is worth to mention that most of failures in VME-based control systems is caused by malfunctions in the power supply subsystem. The redundant power supply allows to maintain the operation of the whole system while the Shelf Manager alerts the operator about the necessity to replace a broken power supply.
