413 research outputs found

    HALO 1.0: A Hardware-agnostic Accelerator Orchestration Framework for Enabling Hardware-agnostic Programming with True Performance Portability for Heterogeneous HPC

    Full text link
    This paper presents HALO 1.0, an open-ended extensible multi-agent software framework that implements a set of proposed hardware-agnostic accelerator orchestration (HALO) principles. HALO implements a novel compute-centric message passing interface (C^2MPI) specification for enabling the performance-portable execution of a hardware-agnostic host application across heterogeneous accelerators. The experiment results of evaluating eight widely used HPC subroutines based on Intel Xeon E5-2620 CPUs, Intel Arria 10 GX FPGAs, and NVIDIA GeForce RTX 2080 Ti GPUs show that HALO 1.0 allows for a unified control flow for host programs to run across all the computing devices with a consistently top performance portability score, which is up to five orders of magnitude higher than the OpenCL-based solution.Comment: 21 page

    Morpheus unleashed: Fast cross-platform SpMV on emerging architectures

    Full text link
    Sparse matrices and linear algebra are at the heart of scientific simulations. Over the years, more than 70 sparse matrix storage formats have been developed, targeting a wide range of hardware architectures and matrix types, each of which exploit the particular strengths of an architecture, or the specific sparsity patterns of the matrices. In this work, we explore the suitability of storage formats such as COO, CSR and DIA for emerging architectures such as AArch64 CPUs and FPGAs. In addition, we detail hardware-specific optimisations to these targets and evaluate the potential of each contribution to be integrated into Morpheus, a modern library that provides an abstraction of sparse matrices (currently) across x86 CPUs and NVIDIA/AMD GPUs. Finally, we validate our work by comparing the performance of the Morpheus-enabled HPCG benchmark against vendor-optimised implementations

    A Survey of Processing Systems for Phylogenetics and Population Genetics

    Get PDF
    The COVID-19 pandemic brought Bioinformatics into the spotlight, revealing that several existing methods, algorithms, and tools were not well prepared to handle large amounts of genomic data efficiently. This led to prohibitively long execution times and the need to reduce the extent of analyses to obtain results in a reasonable amount of time. In this survey, we review available high-performance computing and hardware-accelerated systems based on FPGA and GPU technology. Optimized and hardware-accelerated systems can conduct more thorough analyses considerably faster than pure software implementations, allowing to reach important conclusions in a timely manner to drive scientific discoveries. We discuss the reasons that are currently hindering high-performance solutions from being widely deployed in real-world biological analyses and describe a research direction that can pave the way to enable this
    • …
    corecore