Search CORE

1,922 research outputs found

Scheduling Data-Intensive Tasks on Heterogeneous Many Cores

Author: Kotthaus Helena
Tözün Pinar
Publication venue
Publication date: 01/01/2019
Field of study

The IT University of Copenhagen's Repository

Using Pilot Systems to Execute Many Task Workloads on Supercomputers

Author: Andre Merzky
E Hwang
J Preto
M Wilde
R Pordes
RH Castain
T Maeno
TE Cheatham III
Y Sugita
Publication venue
Publication date: 30/07/2018
Field of study

High performance computing systems have historically been designed to support applications comprised of mostly monolithic, single-job workloads. Pilot systems decouple workload specification, resource selection, and task execution via job placeholders and late-binding. Pilot systems help to satisfy the resource requirements of workloads comprised of multiple tasks. RADICAL-Pilot (RP) is a modular and extensible Python-based pilot system. In this paper we describe RP's design, architecture and implementation, and characterize its performance. RP is capable of spawning more than 100 tasks/second and supports the steady-state execution of up to 16K concurrent tasks. RP can be used stand-alone, as well as integrated with other application-level tools as a runtime system

arXiv.org e-Print Archive

Crossref

High-throughput Binding Affinity Calculations at Extreme Scales

Author: Balasubramanian Vivek
Coveney Peter V
Dakka Jumana
Jha Shantenu
Turilli Matteo
Wan Shunzhou
Wright David W
Zasada Stefan J
Publication venue
Publication date: 13/02/2018
Field of study

Resistance to chemotherapy and molecularly targeted therapies is a major factor in limiting the effectiveness of cancer treatment. In many cases, resistance can be linked to genetic changes in target proteins, either pre-existing or evolutionarily selected during treatment. Key to overcoming this challenge is an understanding of the molecular determinants of drug binding. Using multi-stage pipelines of molecular simulations we can gain insights into the binding free energy and the residence time of a ligand, which can inform both stratified and personal treatment regimes and drug development. To support the scalable, adaptive and automated calculation of the binding free energy on high-performance computing resources, we introduce the High- throughput Binding Affinity Calculator (HTBAC). HTBAC uses a building block approach in order to attain both workflow flexibility and performance. We demonstrate close to perfect weak scaling to hundreds of concurrent multi-stage binding affinity calculation pipelines. This permits a rapid time-to-solution that is essentially invariant of the calculation protocol, size of candidate ligands and number of ensemble simulations. As such, HTBAC advances the state of the art of binding affinity calculations and protocols

arXiv.org e-Print Archive

Directory of Open Access Journals

UCL Discovery

HeteroCore GPU to exploit TLP-resource diversity

Author: Eeckhout Lieven
Wang Zhiying
Zhao Xia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Ghent University Academic Bibliography

High-Throughput Computing on High-Performance Platforms: A Case Study

Author: Angius Alessio
De Kaushik
Jha Shantenu
Klimentov Alexei
Oleynik Danila
Oral Sarp H.
Panitkin Sergey
Turilli Matteo
Wells Jack C.
Publication venue
Publication date: 27/10/2017
Field of study

The computing systems used by LHC experiments has historically consisted of the federation of hundreds to thousands of distributed resources, ranging from small to mid-size resource. In spite of the impressive scale of the existing distributed computing solutions, the federation of small to mid-size resources will be insufficient to meet projected future demands. This paper is a case study of how the ATLAS experiment has embraced Titan---a DOE leadership facility in conjunction with traditional distributed high- throughput computing to reach sustained production scales of approximately 52M core-hours a years. The three main contributions of this paper are: (i) a critical evaluation of design and operational considerations to support the sustained, scalable and production usage of Titan; (ii) a preliminary characterization of a next generation executor for PanDA to support new workloads and advanced execution modes; and (iii) early lessons for how current and future experimental and observational systems can be integrated with production supercomputers and other platforms in a general and extensible manner

arXiv.org e-Print Archive

Crossref

AdaMD: Adaptive Mapping and DVFS for Energy-efficient Heterogeneous Multi-cores

Author: Al-Hashimi Bashir M
Basireddy Karunakar R
Merrett Geoff V
Singh Amit Kumar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2020
Field of study

Modern heterogeneous multi-core systems, containing various types of cores, are increasingly dealing with concurrent execution of dynamic application workloads. Moreover, the performance constraints of each application vary, and applications enter/exit the system at any time. Existing approaches are not efficient in such dynamic scenarios, especially if applications are unknown, as they require extensive offline application analysis and do not consider the runtime execution scenarios (application arrival/completion, and workload and performance variations) for runtime management. To address this, we present AdaMD, an adaptive mapping and dynamic voltage and frequency scaling (DVFS) approach for improving energy consumption and performance. The key feature of the proposed approach is the elimination of dependency on offline profiled results while making runtime decisions. This is achieved through a performance prediction model having a maximum error of 7.9% lower than the previously reported model and a mapping approach that allocates processing cores to applications while respecting performance constraints. Furthermore, AdaMD adapts to runtime execution scenarios efficiently by monitoring the application status, and performance/workload variations to adjust the previous DVFS settings and thread-to-core mappings. The proposed approach is experimentally validated on the Odroid-XU3, with various combinations of diverse multi-threaded applications from PARSEC and SPLASH benchmarks. Results show energy savings of up to 28% compared to the recently proposed approach while meeting performance constraints

University of Essex Research Repository

Southampton (e-Prints Soton)

Exploiting managed language semantics to optimize for hardware heterogeneity

Author: Akram Shoaib
Publication venue
Publication date: 01/01/2019
Field of study

Ghent University Academic Bibliography