131 research outputs found

    HPC-GPT: Integrating Large Language Model for High-Performance Computing

    Full text link
    Large Language Models (LLMs), including the LLaMA model, have exhibited their efficacy across various general-domain natural language processing (NLP) tasks. However, their performance in high-performance computing (HPC) domain tasks has been less than optimal due to the specialized expertise required to interpret the model responses. In response to this challenge, we propose HPC-GPT, a novel LLaMA-based model that has been supervised fine-tuning using generated QA (Question-Answer) instances for the HPC domain. To evaluate its effectiveness, we concentrate on two HPC tasks: managing AI models and datasets for HPC, and data race detection. By employing HPC-GPT, we demonstrate comparable performance with existing methods on both tasks, exemplifying its excellence in HPC-related scenarios. Our experiments on open-source benchmarks yield extensive results, underscoring HPC-GPT's potential to bridge the performance gap between LLMs and HPC-specific tasks. With HPC-GPT, we aim to pave the way for LLMs to excel in HPC domains, simplifying the utilization of language models in complex computing applications.Comment: 9 page

    Data Race Detection Using Large Language Models

    Full text link
    Large language models (LLMs) are demonstrating significant promise as an alternate strategy to facilitate analyses and optimizations of high-performance computing programs, circumventing the need for resource-intensive manual tool creation. In this paper, we explore a novel LLM-based data race detection approach combining prompting engineering and fine-tuning techniques. We create a dedicated dataset named DRB-ML, which is derived from DataRaceBench, with fine-grain labels showing the presence of data race pairs and their associated variables, line numbers, and read/write information. DRB-ML is then used to evaluate representative LLMs and fine-tune open-source ones. Our experiment shows that LLMs can be a viable approach to data race detection. However, they still cannot compete with traditional data race detection tools when we need detailed information about variable pairs causing data races

    The Parallelism Motifs of Genomic Data Analysis

    Get PDF
    Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

    Resiliency in numerical algorithm design for extreme scale simulations

    Get PDF
    This work is based on the seminar titled ‘Resiliency in Numerical Algorithm Design for Extreme Scale Simulations’ held March 1–6, 2020, at Schloss Dagstuhl, that was attended by all the authors. Advanced supercomputing is characterized by very high computation speeds at the cost of involving an enormous amount of resources and costs. A typical large-scale computation running for 48 h on a system consuming 20 MW, as predicted for exascale systems, would consume a million kWh, corresponding to about 100k Euro in energy cost for executing 1023 floating-point operations. It is clearly unacceptable to lose the whole computation if any of the several million parallel processes fails during the execution. Moreover, if a single operation suffers from a bit-flip error, should the whole computation be declared invalid? What about the notion of reproducibility itself: should this core paradigm of science be revised and refined for results that are obtained by large-scale simulation? Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated. More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? While the analysis of use cases can help understand the particular reliability requirements, the construction of remedies is currently wide open. One avenue would be to refine and improve on system- or application-level checkpointing and rollback strategies in the case an error is detected. Developers might use fault notification interfaces and flexible runtime systems to respond to node failures in an application-dependent fashion. Novel numerical algorithms or more stochastic computational approaches may be required to meet accuracy requirements in the face of undetectable soft errors. These ideas constituted an essential topic of the seminar. The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge. This article gathers a broad range of perspectives on the role of algorithms, applications and systems in achieving resilience for extreme scale simulations. The ultimate goal is to spark novel ideas and encourage the development of concrete solutions for achieving such resilience holistically.Peer Reviewed"Article signat per 36 autors/es: Emmanuel Agullo, Mirco Altenbernd, Hartwig Anzt, Leonardo Bautista-Gomez, Tommaso Benacchio, Luca Bonaventura, Hans-Joachim Bungartz, Sanjay Chatterjee, Florina M. Ciorba, Nathan DeBardeleben, Daniel Drzisga, Sebastian Eibl, Christian Engelmann, Wilfried N. Gansterer, Luc Giraud, Dominik G ̈oddeke, Marco Heisig, Fabienne Jezequel, Nils Kohl, Xiaoye Sherry Li, Romain Lion, Miriam Mehl, Paul Mycek, Michael Obersteiner, Enrique S. Quintana-Ortiz, Francesco Rizzi, Ulrich Rude, Martin Schulz, Fred Fung, Robert Speck, Linda Stals, Keita Teranishi, Samuel Thibault, Dominik Thonnes, Andreas Wagner and Barbara Wohlmuth"Postprint (author's final draft

    Distributed workflows with Jupyter

    Get PDF
    The designers of a new coordination interface enacting complex workflows have to tackle a dichotomy: choosing a language-independent or language-dependent approach. Language-independent approaches decouple workflow models from the host code's business logic and advocate portability. Language-dependent approaches foster flexibility and performance by adopting the same host language for business and coordination code. Jupyter Notebooks, with their capability to describe both imperative and declarative code in a unique format, allow taking the best of the two approaches, maintaining a clear separation between application and coordination layers but still providing a unified interface to both aspects. We advocate the Jupyter Notebooks’ potential to express complex distributed workflows, identifying the general requirements for a Jupyter-based Workflow Management System (WMS) and introducing a proof-of-concept portable implementation working on hybrid Cloud-HPC infrastructures. As a byproduct, we extended the vanilla IPython kernel with workflow-based parallel and distributed execution capabilities. The proposed Jupyter-workflow (Jw) system is evaluated on common scenarios for High Performance Computing (HPC) and Cloud, showing its potential in lowering the barriers between prototypical Notebooks and production-ready implementations

    Variable-Size Batched Condition Number Calculation on GPUs

    Get PDF

    Exploration efficace de l'espace d'Ă©tat pour les programmes distribuĂ©s asynchrones: Adaptation de l’UDPOR aux programes MPI.

    Get PDF
    Distributed message passing applications are in the mainstream of information technology since they exploit the power of parallel computer systems to produce higher performance. Designing distributed programs remains challenging because developers have to reason about concurrency, non-determinism, data distribution
 that are main characteristics of distributed programs. Besides, it is virtually impossible to ensure the correctness of such programs via classical testing approaches since one may never successfully reach the execution that leads to unwanted behaviors in the programs. There is thus a need for more powerful verification techniques. Model-checking is one of the formal methods that allows to verify automatically and effectively some properties on models of computer systems by exploring all possible behaviors (states and transitions) of the system model. However, state spaces increase exponentially with the number of concurrent processes, leading to “state space explosion”.Unfolding-based Dynamic Partial Order Reduction (UDPOR) is a recent technique mixing Dynamic Partial Order Reduction (DPOR) with concepts of concurrency theory such as unfoldings to efficiently mitigate state space explosion in model-checking of concurrent programs. It is optimal in the sense that each Mazurkiewicz trace, i.e. a class of interleavings equivalent by commuting adjacent independent actions, is explored exactly once. And it is applicable to running programs, not only models of programs.The thesis aims at adapting UDPOR to verify asynchronous distributed programs (e.g. MPI programs) in the setting of the SIMGRID simulator of distributed applications. To do so, an abstract programming model of asynchronous distributed programs is defined and formalized in the TLA+ language, allowing to precisely define an independence relation, a main ingredient of the concurrency semantics. Then, the adaptation of UDPOR, involving the construction of an unfolding, is made efficient by a precise analysis of dependencies in the programming model, allowing efficient computations of usually costly operation. A prototype implementation of UDPOR adapted to distributed asynchronous programs has been developed, giving promising experimental results on a significant set of benchmarks.Distributed message passing applications are in the mainstream of information technology since they exploit the power of parallel computer systems to produce higher performance. Designing distributed programs remains challenging because developers have to reason about concurrency, non-determinism, data distribution
 that are main characteristics of distributed programs. Besides, it is virtually impossible to ensure the correctness of such programs via classical testing approaches since one may never successfully reach the execution that leads to unwanted behaviors in the programs. There is thus a need for more powerful verification techniques. Model-checking is one of the formal methods that allows to verify automatically and effectively some properties on models of computer systems by exploring all possible behaviors (states and transitions) of the system model. However, state spaces increase exponentially with the number of concurrent processes, leading to “state space explosion”.Unfolding-based Dynamic Partial Order Reduction (UDPOR) is a recent technique mixing Dynamic Partial Order Reduction (DPOR) with concepts of concurrency theory such as unfoldings to efficiently mitigate state space explosion in model-checking of concurrent programs. It is optimal in the sense that each Mazurkiewicz trace, i.e. a class of interleavings equivalent by commuting adjacent independent actions, is explored exactly once. And it is applicable to running programs, not only models of programs.The thesis aims at adapting UDPOR to verify asynchronous distributed programs (e.g. MPI programs) in the setting of the SIMGRID simulator of distributed applications. To do so, an abstract programming model of asynchronous distributed programs is defined and formalized in the TLA+ language, allowing to precisely define an independence relation, a main ingredient of the concurrency semantics. Then, the adaptation of UDPOR, involving the construction of an unfolding, is made efficient by a precise analysis of dependencies in the programming model, allowing efficient computations of usually costly operation. A prototype implementation of UDPOR adapted to distributed asynchronous programs has been developed, giving promising experimental results on a significant set of benchmarks
    • 

    corecore