High-Throughput Computing on High-Performance Platforms: A Case Study

Angius, Alessio; De, Kaushik; Jha, Shantenu; Klimentov, Alexei; Oleynik, Danila; Oral, Sarp H.; Panitkin, Sergey; Turilli, Matteo; Wells, Jack C.

research

High-Throughput Computing on High-Performance Platforms: A Case Study

Authors: Alessio Angius
Kaushik De
Shantenu Jha
Alexei Klimentov
Danila Oleynik
Sarp H. Oral
Sergey Panitkin
Matteo Turilli
Jack C. Wells
Publication date: 27 October 2017
Publisher
Doi

Abstract

The computing systems used by LHC experiments has historically consisted of the federation of hundreds to thousands of distributed resources, ranging from small to mid-size resource. In spite of the impressive scale of the existing distributed computing solutions, the federation of small to mid-size resources will be insufficient to meet projected future demands. This paper is a case study of how the ATLAS experiment has embraced Titan---a DOE leadership facility in conjunction with traditional distributed high- throughput computing to reach sustained production scales of approximately 52M core-hours a years. The three main contributions of this paper are: (i) a critical evaluation of design and operational considerations to support the sustained, scalable and production usage of Titan; (ii) a preliminary characterization of a next generation executor for PanDA to support new workloads and advanced execution modes; and (iii) early lessons for how current and future experimental and observational systems can be integrated with production supercomputers and other platforms in a general and extensible manner

Similar works

Full text

Available Versions

Crossref

info:doi/10.1109%2Fescience.20...

Last time updated on 01/04/2019