Abstract-Throughout the past decade, the advanced telecommunications computing architecture (ATCA) solidified its position as one of the main switched-based crate standards for advanced physics instrumentation, offering not only high performance characteristics in data throughput, channel density, or power supply/dissipation capabilities but also special features for high availability (HA), required for the latest and upcoming large-scale endeavors, as is the case of ITER. Hot swap is one of the main HA features in ATCA, allowing for boards to be replaced in a crate (shelf), without powering off the whole system. Platforms using the peripheral component interconnect express (PCIe) protocol on the fabric interface must be complemented, at the software level, with the PCIe hot-plug native feature, currently not specified for the ATCA form-factor. From a customized hot-plug support implementation for ATCA node boards, this article presents an implementation extension for hub boards, allowing hot-plug of PCIe switching devices, without causing bus enumeration problems. This article further addresses the main issues concerning an eventual standardization of PCIe hot-plug support in ATCA, such as the implementability of hot-plug elements and the generation and management of hot-plug events, aiming to stimulate the discussion within the PCI Industrial Computer Manufacturers Group (PICMG) community toward a long-overdue standardized solution for hot-plug in ATCA.
Abstract-Throughout the past decade, the advanced telecommunications computing architecture (ATCA) solidified its position as one of the main switched-based crate standards for advanced physics instrumentation, offering not only high performance characteristics in data throughput, channel density, or power supply/dissipation capabilities but also special features for high availability (HA), required for the latest and upcoming large-scale endeavors, as is the case of ITER. Hot swap is one of the main HA features in ATCA, allowing for boards to be replaced in a crate (shelf), without powering off the whole system. Platforms using the peripheral component interconnect express (PCIe) protocol on the fabric interface must be complemented, at the software level, with the PCIe hot-plug native feature, currently not specified for the ATCA form-factor. From a customized hot-plug support implementation for ATCA node boards, this article presents an implementation extension for hub boards, allowing hot-plug of PCIe switching devices, without causing bus enumeration problems. This article further addresses the main issues concerning an eventual standardization of PCIe hot-plug support in ATCA, such as the implementability of hot-plug elements and the generation and management of hot-plug events, aiming to stimulate the discussion within the PCI Industrial Computer Manufacturers Group (PICMG) community toward a long-overdue standardized solution for hot-plug in ATCA.
Index Terms-Advanced telecommunications computing architecture (ATCA), high availability (HA), hot-plug, hot swap, peripheral component interconnect express (PCIe).
I. INTRODUCTION
A DVANCED large-scale physics experiments, such as experimental fusion devices, require hard real-time control, and data acquisition systems, particularly in critical diagnostics. This is the case of the vertical stabilization (VS) controller for the Joint European Torus (JET) tokamak, developed by the Instituto de Plasmas e Fusão Nuclear (IPFN) [1] . For ITER [2] , the largest fusion experiment to be built, currently under construction, in Cadarache, France, vertical control of the plasma is critical, as plasma disruptions caused by loss of plasma control may severely damage the tokamak infrastructure, additionally posing both a safety and investment risk. Therefore, high availability (HA) is a key requirement for VS control, as well as other critical diagnostics [3] . IPFN has undertaken a research program on HA control and data acquisition systems [4] , which is simultaneously contributing for the ITER plant control design handbook (PCDH) effort of standardization in fast plant system controllers (FPSCs) [5] , whose aim is to choose the adequate technologies and define specifications to fulfill the specific needs of fast plasma control and data acquisition at ITER. IPFN's main contribution consists in the development of an FPSC prototype, an instrumentation platform for fast plasma control applications. This control and data acquisition platform is based on the advanced telecommunications computing architecture (ATCA), released by the peripheral component interconnect (PCI) Industrial Computer Manufacturers Group (PICMG) [6] . ATCA is an industry switched-fabric modular standard, where hardware modules [front boards or rear transmission modules (RTMs)] are inserted in a crate (known as a shelf) and communicate through a passive circuit (backplane). ATCA enables systems to be designed with HA, due to its robust hardware management capabilities and several types of redundancy resources, which may be used to implement fault-tolerance mechanisms. Hot swap is one of its most important features, allowing boards and other intelligent field replaceable units (FRUs) to be replaced in a shelf, without powering off the whole system, benefiting system maintainability, thus availability.
ATCA, having been originally conceived for the telecom market, most of its commercially available products, use Ethernet for communications protocol. Subsequent interest in adopting ATCA for instrumentation purposes led to the development of a specification extension for the peripheral component interconnect express (PCIe) protocol [7] , a consensual choice within the instrumentation community, in particular for physics experiments.
PCIe is a high-speed serial computer expansion bus standard [8] , also providing desirable features for HA systems, such as power management, quality of service, data integrity, error handling and a Hot-Plug feature for insertion and removal of adapter cards while keeping the system running. For form-factors other than PCIe itself, hot-plug is implementation-dependent and should be defined by the form-factor. However, ATCA has not, yet, specified such procedures. In order to obtain complete, seamless board insertion/extraction, this mechanism must be designed for ATCA, 0018-9499 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. which constitutes a major challenge for the developed system. This article briefly introduces the IPFN FPSC architecture, taking from an initial customized hot-plug support implementation, developed for ATCA node boards, and extending it for the use case of hub boards, allowing the hot-plug of PCIe switching devices, previously known to cause bus enumeration problems [9] . The presented article further addresses the main issues concerning an eventual standardization of PCIe hot-plug support in ATCA, suggesting an architectural strategy toward its realization, with the intent of stimulating the discussion within the PICMG community.
II. FAST PLANT SYSTEM CONTROLLER ARCHITECTURE
The architecture of IPFN's FPSC consists of one ATCA shelf, with a capacity of 14 boards-front boards and respective RTMs, populated with the following types of IPFN-developed hardware modules.
1) ATCA-IO-Processor [10] -digitizing (node) boards with data processing capabilities (48 analog IO channels, 18-bit ADC, two MSPS, and/or 16-bit DAC, one MSPS). 2) ATCA-PTSW-AMC4 [11] -PCIe switch (hub) boards, for fast switching of ATCA-IO-Processor generated data, with external PCIe cable interface in the RTM. 3) PCIe external host computer with external cabling interface (commercially available) [12] . The resulting PCIe topology is star (one switch and several endpoints). Dual-star can be implemented by using two hub boards with replicated node boards, thus enabling 2N redundancy [13] . Fig. 1 shows the resulting architecture for one hub (board) and N nodes (boards). On the bottom, the ATCA hardware platform management (HPM) system is managed by the shelf manager (ShM); its shelf manager controller (ShMC) is responsible for monitoring board health status, by communicating with respective intelligent platform management controller (IPMC), via the intelligent platform management interface (IPMI) protocol [14] , through the intelligent platform management bus (IPMB-0). The HPM system also manages the hot-swap procedures, which allow to selectively insert/extract boards without having to power off the shelf. This is accomplished through specified sequences of FRU states, numbered M0 through M7, which are initiated by a human operator, using the board handles to trigger board activation/deactivation requests from the board IPMC to the ShMC [15] . Once these requests are accepted, the operator is notified by signaling LED that the board is activated, or that it has been deactivated and may be safely extracted.
The top half of Fig. 1 
, which handles PCIe hot-plug signaling from existing hot-plug elements and generates hot-plug events, sent as software interrupts to the host, thus allowing to gracefully insert and remove the endpoint device driver and to safely open/close software applications using the endpoint.
III. CUSTOMIZED HOT-PLUG SUPPORT
As stated previously, one of the major challenges was to implement hot-plug support for the FPSC platform, as it is not specified by ATCA. For the hot-plug events to be generated, the PCIe Base Specification defines a set of hot-plug (hardware) elements (e.g., attention button, manually operated retention latch, power controller, attention and power indicators) to be connected to each HPC port interface; these elements do not exist in the ATCA specification, and therefore, are not provided in the developed FPSC boards.
A. Hot-Plug of Node Boards
A customized solution has been implemented for hot-plug of node boards [17] . During the standard ATCA hot-swap procedure of a node board, the IPMC negotiates the FRU state sequence (for insertion or extraction) with the ShMC. At each FRU state change, an IPMI Platform Event Message is sent to the ShMC, containing the current (and previous) FRU state. For hot-plug implementation, a copy of this message is additionally sent, through the IPMB-0 bus, to the hub board IPMC, where the PEX 8696 (containing the HPCs for all node boards) is located. Although hot-plug elements are not (physically) implemented, the PEX 8696 provides a test mode, which allows HPC registers to be software-written. The hub IPMC is programmed to write the HPC registers of the PEX 8696 (via local I 2 C bus), following the PEX 8696 hot-plug definitions, in coordination with the ATCA FRU state sequence information, received in each Platform Event Message. For example, according to the PEX 8696 datasheet Fig. 2 . Customized implementation of hot-plug for ATCA node boards using IPMC messaging (along with the standard hot-swap procedure) to trigger hot-plug events and send respective interrupts upstream.
hot-plug definitions, an insertion begins by pressing the Attention Button. On receiving a message from the node IPMC with the transition from FRU state M1 (board present) to M2 (activation request), the hub IPMC is programmed to write the "attention button pressed" bit on the corresponding HPC register, generating the corresponding hot-plug event. Thus, although a physical Attention Button is not present, the HPC is able to issue the corresponding hot-plug interrupt, which is sent in-band, upstream through the external cable, to the PCIe host. Fig. 2 highlights (in yellow) the remote location of node 1 HPC with respect to its actual endpoint device, showing the additional messaging from node 1 IPMC to hub IPMC, and hub IPMC writing via local I 2 C bus to generate the required hot-plug events at the PEX 8696 switch.
B. Hot-Plug of Hub Boards
Hot-plug of PCIe switch devices is known to cause PCIe bus enumeration issues, which prevent new devices of being added-an issue not exclusive to ATCA implementations since it is acknowledged by PCI-SIG [9] . In fact, the distributed HPC architecture shows that the HPC of the PEX 8696 switch is located upstream, outside the ATCA shelf, at the downstream port of the host external cable adapter. This means that hot-plug of the hub board must follow the hot-plug definitions for this PCIe hardware device, as per the Standard HPC specification [18] . Still, it is the ATCA hub board which must trigger the necessary hot-plug event, namely by toggling the cable present (CPRSNT#) sideband signal. In analogy with the previous procedure for node boards, this was achieved by additionally controlling the state of CPRSNT#, using a general-purpose input-output (GPIO) port, again in coordination with the standard ATCA hot-swap sequences (activation or deactivation), as shown in Fig. 3 .
C. Software and Operating System
Initial developments for hot-plug support implementation for the IPFN FPSC were performed using Redhat Enterprise Linux operating system (OS), as per ITER PCDH [19] . Fig. 3 . Customized implementation of hot-plug for ATCA hub boards, using the Hub IPMC to control the external cable CPSRNT# sideband signal to generate the specified hot-plug events at the respective HPC, located at the PCIe external cable adapter board, at the host computer.
With this setup, hot-plug of the PEX 8696 device (i.e., hub board) led to occasional OS instabilities due to the aforementioned bus enumeration issues. It was decided to use another Linux-based release-Fedora 27 [20] , which supports newer kernel versions that include additional boot configuration parameters concerning hot-plug operations [21] . The hpbussizeparameter, available on kernel 4.15, allows setting the minimum amount of additional bus numbers reserved for buses below a hot-plug bridge/switch. Setting an adequate value of the hpbussize should require projecting the number of bus numbers, which will be needed for the hot-plug capable devices used. If the value is set too low, new devices to be added after system boot may not have available bus numbers to be hot-added. If the value is too high, it is bound to cause conflicts with devices next in the enumeration sequence (e.g., PCIe network adapter) and cause OS instabilities.
By correctly dimensioning the value of hpbussize parameter, successful hot-add and hot-removal of the PEX 8696 was achieved, reflected on selected devices appearing (or disappearing) from the device list (lspci OS prompt command). Not only the PEX 8696 switch device has been successfully hot-plugged, but also endpoints connected to the switch (node board endpoints) were automatically hot-plugged, leading to the seamless operation of the developed test-bench application, monitoring data from the ATCA-IO-Processor boards [22] .
IV. TOWARD HOT-PLUG SUPPORT STANDARDIZATION
As explained in the preceding sections, ATCA hot swap and PCIe hot-plug coexist as separate procedures, each managed by its own management system-hot swap is managed by the HPM system, whereas hot-plug is managed by the distributed HPC architecture. This means that board physical insertion/extraction (hardware level) is managed by IPMC, while PCIe hot-plug procedures perform device insertion/removal at software level only. Therefore, PCIe hot-plug insertion/removal event sequence must be coordinated with ATCA hot-swap activation/deactivation sequences; however, neither of each of the management systems is aware of the other, which can lead to system failure, if coordination between hot swap and hot-plug is lost (e.g., OS hang-up due to software bug). Furthermore, the customized solution uses non-IPMI messaging on the IPMB-0 bus, which constitutes a violation of the ATCA specification. A future hot-plug standardization for ATCA should use the ShMC to additionally perform hot-plug messaging, as only the ShMC is aware of all board FRU states in the shelf and would then be able to perform and verify coordination between hot swap and hot-plug, using standard IPMI messaging on IPMB-0. In this scenario, a future ATCA specification extension for hot-plug support should additionally:
1) add new FRU definition fields for hot-plug capabilities of boards, allowing the ShMC to identify hot-pluggable devices; 2) define which are the required hot-plug (hardware) elements needed in an ATCA board, and their connection paths to respective HPCs; 3) consider software triggering of events, for which the PCIe switching hardware must be capable (PEX 8696 supports this feature, but other devices may not support it); 4) consider the hot-plug capabilities of upstream hardware and software (e.g., external cabling sideband signals connecting to host), for hot-plug of hub boards.
V. CONCLUSION
The hot-plug feature has generally been of limited adoption by PCI-SIG, and, consequently, poorly supported by instrumentation standards. However, hot-plug support is vital to implement fault-tolerance mechanisms in PCIe-based ATCA platforms targeting HA.
As ATCA currently does not specify hot-plug support, a customized solution for node boards has been developed, using additional messaging between IPMCs and software-generated hot-plug events, coordinated with hot swap.
Hot-plug of hub (switch) boards was also successfully achieved, on specific OS environment, using additional IPMC control of external cable sideband signals connecting to the host computer. These solutions have been tested, showing that seamless continuity of operation at both hardware and software levels had been achieved, thus benefiting system availability.
For the desirable specification extension of hot-plug support in ATCA, a set of guidelines has been suggested. The hypothesis implied that it would be ShMC responsible for handling hot-plug messaging, requiring new definitions for FRU information fields and features to be added to ShMC and IPMC specifications. The development of such solution requires, therefore, the indispensable involvement from both hardware and software manufacturers (especially ShM and IPMC developers), but also closely connected OS developers.
Only a standardized hot-plug solution will drive the industry to deliver compliant products (e.g., switches, FPGA PCIe cores, OSs). In this respect, the PICMG xTCA for Physics Committee (including IPFN), has taken a first step, by releasing a hot-plug design guide for MicroTCA [23] .
The work carried out aims to further encourage the involved communities in the efforts of standardization-toward a specification extension for the PICMG xTCA standards, to cater to the HA requirements in upcoming large physics experiments.
