138 research outputs found
The ATLAS Workflow Management System Evolution in the LHC Run3 and towards the High-Luminosity LHC era
The ATLAS experiment has 18+ years of experience using workload management systems to deploy and develop workflows to process and to simulate data on the distributed computing infrastructure. Simulation, processing and analysis of LHC experiment data require the coordinated work of heterogeneous computing resources. In particular, the ATLAS experiment utilizes the resources of 250 computing centers worldwide, the power of supercomputing centres, and national, academic and commercial cloud computing resources. In this contribution, we present new techniques for cost-effectively improving efficiency introduced in workflow management system software. The evolution from a mesh framework to new types of computing facilities such as cloud and HPCs is described, as well as new types of production and analysis workflows
High-Throughput Computing on High-Performance Platforms: A Case Study
The computing systems used by LHC experiments has historically consisted of
the federation of hundreds to thousands of distributed resources, ranging from
small to mid-size resource. In spite of the impressive scale of the existing
distributed computing solutions, the federation of small to mid-size resources
will be insufficient to meet projected future demands. This paper is a case
study of how the ATLAS experiment has embraced Titan---a DOE leadership
facility in conjunction with traditional distributed high- throughput computing
to reach sustained production scales of approximately 52M core-hours a years.
The three main contributions of this paper are: (i) a critical evaluation of
design and operational considerations to support the sustained, scalable and
production usage of Titan; (ii) a preliminary characterization of a next
generation executor for PanDA to support new workloads and advanced execution
modes; and (iii) early lessons for how current and future experimental and
observational systems can be integrated with production supercomputers and
other platforms in a general and extensible manner
Unified System for Processing Real and Simulated Data in the ATLAS Experiment
The physics goals of the next Large Hadron Collider run include high
precision tests of the Standard Model and searches for new physics. These goals
require detailed comparison of data with computational models simulating the
expected data behavior. To highlight the role which modeling and simulation
plays in future scientific discovery, we report on use cases and experience
with a unified system built to process both real and simulated data of growing
volume and variety.Comment: XVII International Conference Data Analytics and Management in Data
Intensive Domains (DAMDID/RCDL), Obninsk, Russia, October 13 - 16, 201
BigPanDA monitoring system evolution in the ATLAS Experiment
Monitoring services play a crucial role in the day-to-day operation of distributed computing systems. The ATLAS Experiment at LHC uses the Production and Distributed Analysis workload management system (PanDA WMS), which allows a million computational jobs to run daily at over 170 computing centers of the WLCG and opportunistic resources, utilizing 600k cores simultaneously on average. The BigPanDA monitor is an essential part of the monitoring infrastructure for the ATLAS Experiment that provides a wide range of views, from top-level summaries to a single computational job and its logs. Over the past few years of the PanDA WMS advancement in the ATLAS Experiment, several new components were developed, such as Harvester, iDDS, Data Carousel, and Global Shares. Due to its modular architecture, the BigPanDA monitor naturally grew into a platform where the relevant data from all PanDA WMS components and accompanying services are accumulated and displayed in the form of interactive charts and tables. Moreover the system has been adopted by other experiments beyond HEP. In this paper we describe the evolution of the BigPanDA monitor system, the development of new modules, and the integration process into other experiments
Operational Analytics Studies for ATLAS Distributed Computing: Data Popularity Forecast and Utilization of the WLCG Centers
Operational analytics is the direction of research related to the analysis of the current state of computing processes and the prediction of future states in order to anticipate imbalances and take timely measures to stabilize a complex system. There are two relevant areas in ATLAS Distributed Computing that are currently the focus of studies: user physics analysis including the forecast of popularity of data samples among users, and evaluating WLCG centers for their readiness to process user analysis payloads. Studying these areas is challenging due to the complexity involved, as it requires a comprehensive understanding of numerous boundary conditions typically found in large-scale distributed computing infrastructures. Forecasts of data popularity are problematic without the categorization of user tasks by their types (data transformation or physics analysis), which do not always appear on the surface but may induce noise, which introduces significant distortions for predictive analysis. Evaluating the WLCG resources by their analysis workloads is also a challenging task as it is necessary to find a balance between the workload of the resource, its performance, the waiting time for jobs on it, as well as the volume of jobs that it processes. This is especially difficult in a heterogeneous computing environment, where legacy resources are used along with modern high-performance machines. We will look at these areas of research in detail and discuss what tools and methods are used in our work, demonstrating results already obtained
ATLAS Data Analysis using a Parallel Workflow on Distributed Cloud-based Services with GPUs
A new type of parallel workflow is developed for the ATLAS experiment at the Large Hadron Collider, that makes use of distributed computing combined with a cloud-based infrastructure. This has been developed for a specific type of analysis using ATLAS data, one popularly referred to as Simulation-Based Inference (SBI). The JAX library is used for the parts of the workflow to compute gradients as well as accelerate program execution using just-in-time compilation, which becomes essential in a full SBI analysis and can also offer significant speed-ups in more traditional types of analysis
Updates to the ATLAS Data Carousel Project
The High Luminosity upgrade to the LHC (HL-LHC) is expected to deliver scientific data at the multi-exabyte scale. In order to address this unprecedented data storage challenge, the ATLAS experiment launched the Data Carousel project in 2018. Data Carousel is a tape-driven workflow whereby bulk production campaigns with input data resident on tape are executed by staging and promptly processing a sliding window to disk buffer such that only a small fraction of inputs are pinned on disk at any one time. Data Carousel is now in production for ATLAS in Run3. In this paper, we provide updates on recent Data Carousel R&D projects, including data-on-demand and tape smart writing. Data-on-demand removes from disk data that has not been accessed for a predefined period, when users request them, they will be either staged from tape or recreated by following the original production steps. Tape smart writing employs intelligent algorithms for file placement on tape in order to retrieve data back more efficiently, which is our long term strategy to achieve optimal tape usage in Data Carousel
Methods of Data Popularity Evaluation in the ATLAS Experiment at the LHC
International audienceThe ATLAS Experiment at the LHC generates petabytes of data that is distributed among 160 computing sites all over the world and is processed continuously by various central production and user analysis tasks. The popularity of data is typically measured as the number of accesses and plays an important role in resolving data management issues: deleting, replicating, moving between tapes, disks and caches. These data management procedures were still carried out in a semi-manual mode and now we have focused our efforts on automating it, making use of the historical knowledge about existing data management strategies. In this study we describe sources of information about data popularity and demonstrate their consistency. Based on the calculated popularity measurements, various distributions were obtained. Auxiliary information about replication and task processing allowed us to evaluate the correspondence between the number of tasks with popular data executed per site and the number of replicas per site. We also examine the popularity of user analysis data that is much less predictable than in the central production and requires more indicators than just the number of accesses
Utilizing Distributed Heterogeneous Computing with PanDA in ATLAS
In recent years, advanced and complex analysis workflows have gained increasing importance in the ATLAS experiment at CERN, one of the large scientific experiments at LHC. Support for such workflows has allowed users to exploit remote computing resources and service providers distributed worldwide, overcoming limitations on local resources and services. The spectrum of computing options keeps increasing across the Worldwide LHC Computing Grid (WLCG), volunteer computing, high-performance computing, commercial clouds, and emerging service levels like Platform-as-a-Service (PaaS), Container-as-a-Service (CaaS) and Function-as-a-Service (FaaS), each one providing new advantages and constraints. Users can significantly benefit from these providers, but at the same time, it is cumbersome to deal with multiple providers, even in a single analysis workflow with fine-grained requirements coming from their applications’ nature and characteristics. In this paper, we will first highlight issues in geographically-distributed heterogeneous computing, such as the insulation of users from the complexities of dealing with remote providers, smart workload routing, complex resource provisioning, seamless execution of advanced workflows, workflow description, pseudointeractive analysis, and integration of PaaS, CaaS, and FaaS providers. We will also outline solutions developed in ATLAS with the Production and Distributed Analysis (PanDA) system and future challenges for LHC Run4
ISOTOPIC COMPOSITION OF LIGHT NUCLEI IN COSMIC RAYS: RESULTS FROM AMS-01
The variety of isotopes in cosmic rays allows us to study different aspects of the processes that cosmic rays undergo between the time they are produced and the time of their arrival in the heliosphere. In this paper, we present measurements of the isotopic ratios [superscript 2]H/[superscript 4]He, [superscript 3]He/[superscript 4]He, [superscript 6]Li/[superscript 7]Li, [superscript 7]Be/([superscript 9]Be+[superscript 10]Be), and [superscript 10]B/[superscript 11]B in the range 0.2-1.4 GeV of kinetic energy per nucleon. The measurements are based on the data collected by the Alpha Magnetic Spectrometer, AMS-01, during the STS-91 flight in 1998 June.United States. Dept. of EnergyMassachusetts Institute of Technolog
- …