Search CORE

53 research outputs found

High-Throughput Computing on High-Performance Platforms: A Case Study

Author: Angius Alessio
De Kaushik
Jha Shantenu
Klimentov Alexei
Oleynik Danila
Oral Sarp H.
Panitkin Sergey
Turilli Matteo
Wells Jack C.
Publication venue
Publication date: 27/10/2017
Field of study

The computing systems used by LHC experiments has historically consisted of the federation of hundreds to thousands of distributed resources, ranging from small to mid-size resource. In spite of the impressive scale of the existing distributed computing solutions, the federation of small to mid-size resources will be insufficient to meet projected future demands. This paper is a case study of how the ATLAS experiment has embraced Titan---a DOE leadership facility in conjunction with traditional distributed high- throughput computing to reach sustained production scales of approximately 52M core-hours a years. The three main contributions of this paper are: (i) a critical evaluation of design and operational considerations to support the sustained, scalable and production usage of Titan; (ii) a preliminary characterization of a next generation executor for PanDA to support new workloads and advanced execution modes; and (iii) early lessons for how current and future experimental and observational systems can be integrated with production supercomputers and other platforms in a general and extensible manner

arXiv.org e-Print Archive

Crossref

Using Pilot Systems to Execute Many Task Workloads on Supercomputers

Author: Andre Merzky
E Hwang
J Preto
M Wilde
R Pordes
RH Castain
T Maeno
TE Cheatham III
Y Sugita
Publication venue
Publication date: 30/07/2018
Field of study

High performance computing systems have historically been designed to support applications comprised of mostly monolithic, single-job workloads. Pilot systems decouple workload specification, resource selection, and task execution via job placeholders and late-binding. Pilot systems help to satisfy the resource requirements of workloads comprised of multiple tasks. RADICAL-Pilot (RP) is a modular and extensible Python-based pilot system. In this paper we describe RP's design, architecture and implementation, and characterize its performance. RP is capable of spawning more than 100 tasks/second and supports the steady-state execution of up to 16K concurrent tasks. RP can be used stand-alone, as well as integrated with other application-level tools as a runtime system

arXiv.org e-Print Archive

Crossref

XtreemOS: a Vision for a Grid Operating System

Author: Cortes T
Franke C
Jegou Y
Kielmann T
Laforenza D
Matthews Brian
Morin C
Prieto LP
Reinefeld A
Publication venue
Publication date: 01/01/2008
Field of study

ePubs: the open archive for STFC research publications

A Generic Development and Deployment Framework for Cloud Computing and Distributed Applications

Author: Hluchy Ladislav
Nguyen Minh Binh
Tran Viet
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 10/07/2013
Field of study

Cloud computing have paved the way for advance of IT-based demand services. This technology helps decrease operation costs, solve scalability issue and many more user and provider constraints. However, development and deployment of distributed applications on cloud environment becomes a more and more complex tasks. Cloud users must spend a lot of time to prepare, install and configure their applications on clouds. In addition, after development and deployment, the applications almost cannot move from a cloud to others due to the lack of interoperability between them. To address these problems, we present in this paper a novel development and deployment framework for cloud distributed applications/services. Our approach is based on abstraction and object-oriented programming technique, allowing users to easily and rapidly develop and deploy their services into cloud environment. The approach also enables service migration and interoperability among the clouds

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

ArrayBridge: Interweaving declarative array processing with high-performance computing

Author: Blanas Spyros
Brown Paul
Byna Suren
Floratos Sofoklis
Prabhat
Wu Kesheng
Xing Haoyuan
Publication venue
Publication date: 01/01/2017
Field of study

Scientists are increasingly turning to datacenter-scale computers to produce and analyze massive arrays. Despite decades of database research that extols the virtues of declarative query processing, scientists still write, debug and parallelize imperative HPC kernels even for the most mundane queries. This impedance mismatch has been partly attributed to the cumbersome data loading process; in response, the database community has proposed in situ mechanisms to access data in scientific file formats. Scientists, however, desire more than a passive access method that reads arrays from files. This paper describes ArrayBridge, a bi-directional array view mechanism for scientific file formats, that aims to make declarative array manipulations interoperable with imperative file-centric analyses. Our prototype implementation of ArrayBridge uses HDF5 as the underlying array storage library and seamlessly integrates into the SciDB open-source array database system. In addition to fast querying over external array objects, ArrayBridge produces arrays in the HDF5 file format just as easily as it can read from it. ArrayBridge also supports time travel queries from imperative kernels through the unmodified HDF5 API, and automatically deduplicates between array versions for space efficiency. Our extensive performance evaluation in NERSC, a large-scale scientific computing facility, shows that ArrayBridge exhibits statistically indistinguishable performance and I/O scalability to the native SciDB storage engine.Comment: 12 pages, 13 figure

arXiv.org e-Print Archive

eScholarship - University of California

The Technologies Required for Fusing HPC and Real-Time Data to Support Urgent Computing

Author: Brown Nicholas
Gibb Gordon
Nash Rupert
Prodan Bianca
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/12/2019
Field of study

The use of High Performance Computing (HPC) to compliment urgent decision making in the event of disasters is an important future potential use of supercomputers. However, the usage modes involved are rather different from how HPC has been used traditionally. As such, there are many obstacles that need to be overcome, not least the unbounded wait times in the batch system queues, to make the use of HPC in disaster response practical. In this paper, we present how the VESTEC project plans to overcome these issues and develop a working prototype of an urgent computing control system. We describe the requirements for such a system and analyse the different technologies available that can be leveraged to successfully build such a system. We finally explore the design of the VESTEC system and discuss ongoing challenges that need to be addressed to realise a production level system.Comment: Preprint of paper in 2019 IEEE/ACM HPC for Urgent Decision Making (UrgentHPC

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer