Search CORE

131 research outputs found

HPC-GPT: Integrating Large Language Model for High-Performance Computing

Author: Cerpa Alberto E.
Chen Le
Ding Xianzhong
Du Wan
Emani Murali
Liao Chunhua
Lin Pei-Hung
Vanderbruggen Tristan
Xie Zhen
Publication venue
Publication date: 02/10/2023
Field of study

Large Language Models (LLMs), including the LLaMA model, have exhibited their efficacy across various general-domain natural language processing (NLP) tasks. However, their performance in high-performance computing (HPC) domain tasks has been less than optimal due to the specialized expertise required to interpret the model responses. In response to this challenge, we propose HPC-GPT, a novel LLaMA-based model that has been supervised fine-tuning using generated QA (Question-Answer) instances for the HPC domain. To evaluate its effectiveness, we concentrate on two HPC tasks: managing AI models and datasets for HPC, and data race detection. By employing HPC-GPT, we demonstrate comparable performance with existing methods on both tasks, exemplifying its excellence in HPC-related scenarios. Our experiments on open-source benchmarks yield extensive results, underscoring HPC-GPT's potential to bridge the performance gap between LLMs and HPC-specific tasks. With HPC-GPT, we aim to pave the way for LLMs to excel in HPC domains, simplifying the utilization of language models in complex computing applications.Comment: 9 page

arXiv.org e-Print Archive

Data Race Detection Using Large Language Models

Author: Chen Le
Ding Xianzhong
Emani Murali
Liao Chuanhua
Lin Pei-hung
Vanderbruggen Tristan
Publication venue
Publication date: 14/08/2023
Field of study

Large language models (LLMs) are demonstrating significant promise as an alternate strategy to facilitate analyses and optimizations of high-performance computing programs, circumventing the need for resource-intensive manual tool creation. In this paper, we explore a novel LLM-based data race detection approach combining prompting engineering and fine-tuning techniques. We create a dedicated dataset named DRB-ML, which is derived from DataRaceBench, with fine-grain labels showing the presence of data race pairs and their associated variables, line numbers, and read/write information. DRB-ML is then used to evaluate representative LLMs and fine-tune open-source ones. Our experiment shows that LLMs can be a viable approach to data race detection. However, they still cannot compete with traditional data race detection tools when we need detailed information about variable pairs causing data races

arXiv.org e-Print Archive

The Parallelism Motifs of Genomic Data Analysis

Author: Awan Muaaz
Azad Ariful
Brock Benjamin
Buluc Aydin
Egan Rob
Ekanayake Saliya
Ellis Marquita
Georganas Evangelos
Guidi Giulia
Hofmeyr Steven
Oliker Leonid
Selvitopi Oguz
Teodoropol Cristina
Yelick Katherine
Publication venue: 'The Royal Society'
Publication date: 20/01/2020
Field of study

Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

arXiv.org e-Print Archive

eScholarship - University of California

Resiliency in numerical algorithm design for extreme scale simulations

Author: Agullo Emmanuel
Altenbernd Mirco
Anzt Hartwig
Bautista Gomez Leonardo
Benacchio Tommaso
Publication venue: 'SAGE Publications'
Publication date: 01/12/2021
Field of study

This work is based on the seminar titled ‘Resiliency in Numerical Algorithm Design for Extreme Scale Simulations’ held March 1–6, 2020, at Schloss Dagstuhl, that was attended by all the authors. Advanced supercomputing is characterized by very high computation speeds at the cost of involving an enormous amount of resources and costs. A typical large-scale computation running for 48 h on a system consuming 20 MW, as predicted for exascale systems, would consume a million kWh, corresponding to about 100k Euro in energy cost for executing 1023 floating-point operations. It is clearly unacceptable to lose the whole computation if any of the several million parallel processes fails during the execution. Moreover, if a single operation suffers from a bit-flip error, should the whole computation be declared invalid? What about the notion of reproducibility itself: should this core paradigm of science be revised and refined for results that are obtained by large-scale simulation? Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated. More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? While the analysis of use cases can help understand the particular reliability requirements, the construction of remedies is currently wide open. One avenue would be to refine and improve on system- or application-level checkpointing and rollback strategies in the case an error is detected. Developers might use fault notification interfaces and flexible runtime systems to respond to node failures in an application-dependent fashion. Novel numerical algorithms or more stochastic computational approaches may be required to meet accuracy requirements in the face of undetectable soft errors. These ideas constituted an essential topic of the seminar. The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge. This article gathers a broad range of perspectives on the role of algorithms, applications and systems in achieving resilience for extreme scale simulations. The ultimate goal is to spark novel ideas and encourage the development of concrete solutions for achieving such resilience holistically.Peer Reviewed"Article signat per 36 autors/es: Emmanuel Agullo, Mirco Altenbernd, Hartwig Anzt, Leonardo Bautista-Gomez, Tommaso Benacchio, Luca Bonaventura, Hans-Joachim Bungartz, Sanjay Chatterjee, Florina M. Ciorba, Nathan DeBardeleben, Daniel Drzisga, Sebastian Eibl, Christian Engelmann, Wilfried N. Gansterer, Luc Giraud, Dominik G ̈oddeke, Marco Heisig, Fabienne Jezequel, Nils Kohl, Xiaoye Sherry Li, Romain Lion, Miriam Mehl, Paul Mycek, Michael Obersteiner, Enrique S. Quintana-Ortiz, Francesco Rizzi, Ulrich Rude, Martin Schulz, Fred Fung, Robert Speck, Linda Stals, Keita Teranishi, Samuel Thibault, Dominik Thonnes, Andreas Wagner and Barbara Wohlmuth"Postprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Distributed workflows with Jupyter

Author: Aldinucci M.
Cantalupo B.
Cavazzoni C.
Colonnelli I.
Di Carlo R.
Magini N.
Morelli R.
Padovani L.
Rabellino S.
Spampinato C.
Publication venue
Publication date: 01/01/2022
Field of study

The designers of a new coordination interface enacting complex workflows have to tackle a dichotomy: choosing a language-independent or language-dependent approach. Language-independent approaches decouple workflow models from the host code's business logic and advocate portability. Language-dependent approaches foster flexibility and performance by adopting the same host language for business and coordination code. Jupyter Notebooks, with their capability to describe both imperative and declarative code in a unique format, allow taking the best of the two approaches, maintaining a clear separation between application and coordination layers but still providing a unified interface to both aspects. We advocate the Jupyter Notebooks’ potential to express complex distributed workflows, identifying the general requirements for a Jupyter-based Workflow Management System (WMS) and introducing a proof-of-concept portable implementation working on hybrid Cloud-HPC infrastructures. As a byproduct, we extended the vanilla IPython kernel with workflow-based parallel and distributed execution capabilities. The proposed Jupyter-workflow (Jw) system is evaluated on common scenarios for High Performance Computing (HPC) and Cloud, showing its potential in lowering the barriers between prototypical Notebooks and production-ready implementations

Archivio istituzionale della ricerca - Università di Camerino

Variable-Size Batched Condition Number Calculation on GPUs

Author: Anzt Hartwig
Dongarra J.
Flegar G.
Grützmacher Thomas
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 01/01/2019
Field of study

Crossref

KITopen

Exploration efficace de l'espace d'état pour les programmes distribués asynchrones: Adaptation de l’UDPOR aux programes MPI.

Author: Pham The Anh
Publication venue: HAL CCSD
Publication date: 06/12/2019
Field of study

Distributed message passing applications are in the mainstream of information technology since they exploit the power of parallel computer systems to produce higher performance. Designing distributed programs remains challenging because developers have to reason about concurrency, non-determinism, data distribution… that are main characteristics of distributed programs. Besides, it is virtually impossible to ensure the correctness of such programs via classical testing approaches since one may never successfully reach the execution that leads to unwanted behaviors in the programs. There is thus a need for more powerful verification techniques. Model-checking is one of the formal methods that allows to verify automatically and effectively some properties on models of computer systems by exploring all possible behaviors (states and transitions) of the system model. However, state spaces increase exponentially with the number of concurrent processes, leading to “state space explosion”.Unfolding-based Dynamic Partial Order Reduction (UDPOR) is a recent technique mixing Dynamic Partial Order Reduction (DPOR) with concepts of concurrency theory such as unfoldings to efficiently mitigate state space explosion in model-checking of concurrent programs. It is optimal in the sense that each Mazurkiewicz trace, i.e. a class of interleavings equivalent by commuting adjacent independent actions, is explored exactly once. And it is applicable to running programs, not only models of programs.The thesis aims at adapting UDPOR to verify asynchronous distributed programs (e.g. MPI programs) in the setting of the SIMGRID simulator of distributed applications. To do so, an abstract programming model of asynchronous distributed programs is defined and formalized in the TLA+ language, allowing to precisely define an independence relation, a main ingredient of the concurrency semantics. Then, the adaptation of UDPOR, involving the construction of an unfolding, is made efficient by a precise analysis of dependencies in the programming model, allowing efficient computations of usually costly operation. A prototype implementation of UDPOR adapted to distributed asynchronous programs has been developed, giving promising experimental results on a significant set of benchmarks.Distributed message passing applications are in the mainstream of information technology since they exploit the power of parallel computer systems to produce higher performance. Designing distributed programs remains challenging because developers have to reason about concurrency, non-determinism, data distribution… that are main characteristics of distributed programs. Besides, it is virtually impossible to ensure the correctness of such programs via classical testing approaches since one may never successfully reach the execution that leads to unwanted behaviors in the programs. There is thus a need for more powerful verification techniques. Model-checking is one of the formal methods that allows to verify automatically and effectively some properties on models of computer systems by exploring all possible behaviors (states and transitions) of the system model. However, state spaces increase exponentially with the number of concurrent processes, leading to “state space explosion”.Unfolding-based Dynamic Partial Order Reduction (UDPOR) is a recent technique mixing Dynamic Partial Order Reduction (DPOR) with concepts of concurrency theory such as unfoldings to efficiently mitigate state space explosion in model-checking of concurrent programs. It is optimal in the sense that each Mazurkiewicz trace, i.e. a class of interleavings equivalent by commuting adjacent independent actions, is explored exactly once. And it is applicable to running programs, not only models of programs.The thesis aims at adapting UDPOR to verify asynchronous distributed programs (e.g. MPI programs) in the setting of the SIMGRID simulator of distributed applications. To do so, an abstract programming model of asynchronous distributed programs is defined and formalized in the TLA+ language, allowing to precisely define an independence relation, a main ingredient of the concurrency semantics. Then, the adaptation of UDPOR, involving the construction of an unfolding, is made efficient by a precise analysis of dependencies in the programming model, allowing efficient computations of usually costly operation. A prototype implementation of UDPOR adapted to distributed asynchronous programs has been developed, giving promising experimental results on a significant set of benchmarks

INRIA a CCSD electronic archive server

Modern vector architectures for high-performance computing

Author: Poenaru Andrei
Publication venue
Publication date: 12/05/2022
Field of study

Explore Bristol Research

The Landscape of Exascale Research: A Data-Driven Literature Analysis

Author: Belloum A.S.Z.
Heldens S.
Hijma P.
Maassen J.
Van Nieuwpoort R.V.
Van Werkhoven B.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2020
Field of study

International Migration, Integration and Social Cohesion online publications