218 research outputs found

    Investigating applications portability with the Uintah DAG-based runtime system on PetaScale supercomputers

    Get PDF
    pre-printPresent trends in high performance computing present formidable challenges for applications code using multicore nodes possibly with accelerators and/or co-processors and reduced memory while still attaining scalability. Software frameworks that execute machine-independent applications code using a runtime system that shields users from architectural complexities offer a possible solution. The Uintah framework for example, solves a broad class of large-scale problems on structured adaptive grids using fluid-flow solvers coupled with particle-based solids methods. Uintah executes directed acyclic graphs of computational tasks with a scalable asynchronous and dynamic runtime system for CPU cores and/or accelerators/coprocessors on a node. Uintah's clear separation between application and runtime code has led to scalability increases of 1000x without significant changes to application code. This methodology is tested on three leading Top500 machines; OLCF Titan, TACC Stampede and ALCF Mira using three diverse and challenging applications problems. This investigation of scalability with regard to the different processors and communications performance leads to the overall conclusion that the adaptive DAG-based approach provides a very powerful abstraction for solving challenging multi-scale multi-physics engineering problems on some of the largest and most powerful computers available today

    Accelerating Cardiac Bidomain Simulations Using Graphics Processing Units

    Get PDF
    Anatomically realistic and biophysically detailed multiscale computer models of the heart are playing an increasingly important role in advancing our understanding of integrated cardiac function in health and disease. Such detailed simulations, however, are computationally vastly demanding, which is a limiting factor for a wider adoption of in-silico modeling. While current trends in high-performance computing (HPC) hardware promise to alleviate this problem, exploiting the potential of such architectures remains challenging since strongly scalable algorithms are necessitated to reduce execution times. Alternatively, acceleration technologies such as graphics processing units (GPUs) are being considered. While the potential of GPUs has been demonstrated in various applications, benefits in the context of bidomain simulations where large sparse linear systems have to be solved in parallel with advanced numerical techniques are less clear. In this study, the feasibility of multi-GPU bidomain simulations is demonstrated by running strong scalability benchmarks using a state-of-the-art model of rabbit ventricles. The model is spatially discretized using the finite element methods (FEM) on fully unstructured grids. The GPU code is directly derived from a large pre-existing code, the Cardiac Arrhythmia Research Package (CARP), with very minor perturbation of the code base. Overall, bidomain simulations were sped up by a factor of 11.8 to 16.3 in benchmarks running on 6-20 GPUs compared to the same number of CPU cores. To match the fastest GPU simulation which engaged 20 GPUs, 476 CPU cores were required on a national supercomputing facility

    XXVI IUPAP Conference on Computational Physics (CCP2014)

    Get PDF
    The 26th IUPAP Conference on Computational Physics, CCP2014, was held in Boston, Massachusetts, during August 11-14, 2014. Almost 400 participants from 38 countries convened at the George Sherman Union at Boston University for four days of plenary and parallel sessions spanning a broad range of topics in computational physics and related areas. The first meeting in the series that developed into the annual Conference on Computational Physics (CCP) was held in 1989, also on the campus of Boston University and chaired by our colleague Claudio Rebbi. The express purpose of that meeting was to discuss the progress, opportunities and challenges of common interest to physicists engaged in computational research. The conference having returned to the site of its inception, it is interesting to recect on the development of the field during the intervening years. Though 25 years is a short time for mankind, computational physics has taken giant leaps during these years, not only because of the enormous increases in computer power but especially because of the development of new methods and algorithms, and the growing awareness of the opportunities the new technologies and methods can offer. Computational physics now represents a ''third leg'' of research alongside analytical theory and experiments in almost all subfields of physics, and because of this there is also increasing specialization within the community of computational physicists. It is therefore a challenge to organize a meeting such as CCP, which must have suffcient depth in different areas to hold the interest of experts while at the same time being broad and accessible. Still, at a time when computational research continues to gain in importance, the CCP series is critical in the way it fosters cross-fertilization among fields, with many participants specifically attending in order to get exposure to new methods in fields outside their own. As organizers and editors of these Proceedings, we are very pleased with the high quality of the papers provided by the participants. These articles represent a good cross-section of what was presented at the meeting, and it is our hope that they will not only be useful individually for their specific scientific content but will also represent a historical snapshot of the state of computational physics that they represent collectively. The remainder of this Preface contains lists detailing the organizational structure of CCP2014, endorsers and sponsors of the meeting, plenary and invited talks, and a presentation of the 2014 IUPAP C20 Young Scientist Prize. We would like to take the opportunity to again thank all those who contributed to the success of CCP214, as organizers, sponsors, presenters, exhibitors, and participants. Anders Sandvik, David Campbell, David Coker, Ying TangPublished versio

    Blocked All-Pairs Shortest Paths Algorithm on Intel Xeon Phi KNL Processor: A Case Study

    Full text link
    Manycores are consolidating in HPC community as a way of improving performance while keeping power efficiency. Knights Landing is the recently released second generation of Intel Xeon Phi architecture. While optimizing applications on CPUs, GPUs and first Xeon Phi's has been largely studied in the last years, the new features in Knights Landing processors require the revision of programming and optimization techniques for these devices. In this work, we selected the Floyd-Warshall algorithm as a representative case study of graph and memory-bound applications. Starting from the default serial version, we show how data, thread and compiler level optimizations help the parallel implementation to reach 338 GFLOPS.Comment: Computer Science - CACIC 2017. Springer Communications in Computer and Information Science, vol 79

    Doctor of Philosophy

    Get PDF
    dissertationRecent trends in high performance computing present larger and more diverse computers using multicore nodes possibly with accelerators and/or coprocessors and reduced memory. These changes pose formidable challenges for applications code to attain scalability. Software frameworks that execute machine-independent applications code using a runtime system that shields users from architectural complexities oer a portable solution for easy programming. The Uintah framework, for example, solves a broad class of large-scale problems on structured adaptive grids using fluid-flow solvers coupled with particle-based solids methods. However, the original Uintah code had limited scalability as tasks were run in a predefined order based solely on static analysis of the task graph and used only message passing interface (MPI) for parallelism. By using a new hybrid multithread and MPI runtime system, this research has made it possible for Uintah to scale to 700K central processing unit (CPU) cores when solving challenging fluid-structure interaction problems. Those problems often involve moving objects with adaptive mesh refinement and thus with highly variable and unpredictable work patterns. This research has also demonstrated an ability to run capability jobs on the heterogeneous systems with Nvidia graphics processing unit (GPU) accelerators or Intel Xeon Phi coprocessors. The new runtime system for Uintah executes directed acyclic graphs of computational tasks with a scalable asynchronous and dynamic runtime system for multicore CPUs and/or accelerators/coprocessors on a node. Uintah's clear separation between application and runtime code has led to scalability increases without significant changes to application code. This research concludes that the adaptive directed acyclic graph (DAG)-based approach provides a very powerful abstraction for solving challenging multiscale multiphysics engineering problems. Excellent scalability with regard to the different processors and communications performance are achieved on some of the largest and most powerful computers available today
    • …
    corecore