14 research outputs found
Quantitative methods for data driven reliability optimization of engineered systems
Particle accelerators, such as the Large Hadron Collider at CERN, are among the largest and most complex engineered systems to date. Future generations of particle accelerators are expected to increase in size, complexity, and cost. Among the many obstacles, this introduces unprecedented reliability challenges and requires new reliability optimization approaches.
With the increasing level of digitalization of technical infrastructures, the rate and granularity of operational data collection is rapidly growing. These data contain valuable information for system reliability optimization, which can be extracted and processed with data-science methods and algorithms. However, many existing data-driven reliability optimization methods fail to exploit these data, because they make too simplistic assumptions of the system behavior, do not consider organizational contexts for cost-effectiveness, and build on specific monitoring data, which are too expensive to record.
To address these limitations in realistic scenarios, a tailored methodology based on CRISP-DM (CRoss-Industry Standard Process for Data Mining) is proposed to develop data-driven reliability optimization methods. For three realistic scenarios, the developed methods use the available operational data to learn interpretable or explainable failure models that allow to derive permanent and generally applicable reliability improvements: Firstly, novel explainable deep learning methods predict future alarms accurately from few logged alarm examples and support root-cause identification. Secondly, novel parametric reliability models allow to include expert knowledge for an improved quantification of failure behavior for a fleet of systems with heterogeneous operating conditions and derive optimal operational strategies for novel usage scenarios. Thirdly, Bayesian models trained on data from a range of comparable systems predict field reliability accurately and reveal non-technical factors' influence on reliability.
An evaluation of the methods applied to the three scenarios confirms that the tailored CRISP-DM methodology advances the state-of-the-art in data-driven reliability optimization to overcome many existing limitations. However, the quality of the collected operational data remains crucial for the success of such approaches. Hence, adaptations of routine data collection procedures are suggested to enhance data quality and to increase the success rate of reliability optimization projects. With the developed methods and findings, future generations of particle accelerators can be constructed and operated cost-effectively, ensuring high levels of reliability despite growing system complexity
Field-Reliability Predictions based on Statistical System Life Cycle Models
Reliability measures the ability of a system to provide its intended level of service. It is influenced by many factors throughout a system life-cycle. A detailed understanding of their impact often remains elusive since these factors cannot be studied independently. Formulating reliability studies as a Bayesian regression problem allows assessment of their impact simultaneously and to identify a predictive model of reliability metrics. The proposed method is applied to currently operational particle accelerator equipment at CERN. Relevant metrics were gathered by combining data from various organizational databases. To obtain predictive models, different supervised machine learning algorithms are applied and compared in terms of their prediction error and reliability. Results show that the identified models accurately predict the mean-time-between-failure of devices – an important reliability metric for repairable systems - and reveal factors which lead to an increased dependability. These results provide valuable inputs for early development stages of highly dependable equipment for future particle accelerators
Power Converter Maintenance Optimization Using a Model-Based Digital Reliability Twin Paradigm
Optimization of operations and maintenance activities in factories was estimated to have a global economic potential of 1.2 to 3.7 trillion USD by recent studies. Digital twins offer a framework to achieve such optimization by studying potential improvements in the virtual space before applying them to the real world. We studied the use of a digital twin based on a general model of system failure behaviour for maintenance optimization by combining existing methodologies into a general framework. Applying it to a real-world power converter use case, we identified either reactive or preventive maintenance to be more cost-effective depending on the operating conditions. This allowed to predict optimal maintenance for existing and future systems
Availability Targets Scaled According to Assurance Complexity in the FCC-ee
The Future Circular Collider (FCC) is the leading proposal for the next generation of energy- frontier particle accelerators. Its first stage, the FCC-ee, schedules 185 days to physics each year, of which 80% must be spent at nominal parameters if integrated luminosity goals are to be reached. For comparison, the Large Hadron Collider (LHC) was available for 77% of the physics production in Run 2, 2016-2018. The additional challenges in maintaining the FCC-ee, like its size, complexity and ambitious technical objectives, make availability a significant risk to its physics deliverables. This paper presents a heuristic methodology to break down the global 80% availability requirement into the FCC-ee’s main constituent systems. This quantifies availability targets that scale with the complexity (or “difficulty”) of assuring availability. The contributions are threefold: First, this provides a benchmark against which to assess the severity of the FCC-ee availability challenge and the risk to availability from each system. Second, the presented methodology provides a platform to translate changes in one system’s availability to that of the FCC-ee overall, which is applicable in numerous future studies. Third, the methodology is generally applicable to any future machine for which concrete and detailed designs are unavailable, and may be re-utilized in numerous engineering applications
Machine learning for early fault detection in accelerator systems
With the development of systems based on a combination of mechanics, electronics and – more and more - software components, increasing system complexity is a de facto trend in the engineering world. Particle accelerators make no exception to this paradigm. The continuous push for higher energies driven by particle physics implies that next generation machines will be at least one order of magnitude larger and more complex than present ones, posing unprecedented challenges in terms of beam performance and availability. The two most promising approaches CERN discusses as next generation projects are the Future Circular Collider (FCC) and the Compact Linear Collider (CLIC), with a size of 100 km and 48 km, respectively (see Fig.1 and Fig. 2)
Availability Modeling of the Solid-State Power Amplifiers for the CERN SPS RF Upgrade
As part of the LHC Injector Upgrade program a complete overhaul of the Super Proton Synchrotron Radio-Frequency (RF) system took place. New cavities have been installed for which the solid-state technology was chosen to deliver a combined RF power of 2 MW from 2560 RF amplifiers. This strategy promises high availability as the system continues operation when some of the amplifiers fail. This study quantifies the operational availability that can be achieved with this new installation. The evaluation is based on a Monte Carlo simulation of the system using the novel AvailSim4 simulation software. A model based on lifetime estimations of the RF modules is compared against data from early operational experience. Sensitivity analyses have been made, that give insight to the chosen operational scenario. With the increasing use of solid-state RF power amplifiers, the findings of this study provide a useful reference for future application of this technology in particle accelerators
Injector Availability Report 2023
This document summarises the injector availability during 2023. This note has been produced and ratified by the Reliability and Availability Working Group (RAWG), which has compiled fault information for the period in question using the Accelerator Fault Tracker (AFT). A separate note describes the LHC availability during 2023 proton physics
Machine Protection and Availability in the FCC-ee
The FCC-ee will combine high stored beam energy with small vertical emittance. The loss of only a small part of the beam into accelerator equipment, collimators or passive absorbers could be extremely destructive. Therefore, a highly reliable machine protection system is required for the entire accelerator chain. Further, the FCC-ee schedule has 185 days allocated to physics each year, of which a minimum percent must be spent at nominal parameters if integrated luminosity goals are to be reached. Machine protection and availability are vital considerations from the outset of the design stage. This paper presents the current status and outlook for both topics relating to analysis, research and development (R&D). First, relevant topics in machine protection are discussed. Fault mechanisms are treated as well as key considerations such as minimum reaction time, beam loss detection, dust interaction, fast failures and quench protection. Next, availability assurance is considered. Three steps are proposed: (I) Coarsely define system availability targets scaled to the complexity of delivery. (II) Establish the projected availability based on existing designs and similar systems. (III) If the latter is insufficient, study how this can be improved
Reliability studies for CERN’s new safe machine parameter system
The Safe Machine Parameter system (SMP) is a critical part of the machine protection system in CERN’s Large Hadron Collider (LHC) and the Super Proton Synchrotron (SPS). It broadcasts safety-critical parameters like beam energy, beam intensity, the beta functions and flags indicating safety levels of the beam to other machine protection elements. The current SMP will be replaced by a consolidated system during CERN’s Long Shutdown 3, foreseen to start in 2026. In this contribution the results of the reliability study of the new SMP system are presented. This study quantifies the criticality of end-users by identifying the hazard chains leading to potential damage of the involved equipment. Data-driven risk matrices are used to derive acceptable failure frequencies and reliability requirements. The study encompasses Monte Carlo simulations of sub-system level configurations to support the decision-making process in this project
Machine Learning Models for Breakdown Prediction in RF Cavities for Accelerators
Radio Frequency (RF) breakdowns are one of the most prevalent limits in RF cavities for particle accelerators. During a breakdown, field enhancement associated with small deformations on the cavity surface results in electrical arcs. Such arcs degrade a passing beam and if they occur frequently, they can cause irreparable damage to the RF cavity surface. In this paper, we propose a machine learning approach to predict the occurrence of breakdowns in CERN’s Compact LInear Collider (CLIC) accelerating structures. We discuss state-of-the-art algorithms for data exploration with unsupervised machine learning, breakdown prediction with supervised machine learning, and result validation with Explainable-Artificial Intelligence (Explainable AI). By interpreting the model parameters of various approaches, we go further in addressing opportunities to elucidate the physics of a breakdown and improve accelerator reliability and operation