397 research outputs found

    GekkoFS: A temporary distributed file system for HPC applications

    Get PDF
    We present GekkoFS, a temporary, highly-scalable burst buffer file system which has been specifically optimized for new access patterns of data-intensive High-Performance Computing (HPC) applications. The file system provides relaxed POSIX semantics, only offering features which are actually required by most (not all) applications. It is able to provide scalable I/O performance and reaches millions of metadata operations already for a small number of nodes, significantly outperforming the capabilities of general-purpose parallel file systems.The work has been funded by the German Research Foundation (DFG) through the ADA-FS project as part of the Priority Programme 1648. It is also supported by the Spanish Ministry of Science and Innovation (TIN2015–65316), the Generalitat de Catalunya (2014–SGR–1051), as well as the European Union’s Horizon 2020 Research and Innovation Programme (NEXTGenIO, 671951) and the European Comission’s BigStorage project (H2020-MSCA-ITN-2014-642963). This research was conducted using the supercomputer MOGON II and services offered by the Johannes Gutenberg University Mainz.Peer ReviewedPostprint (author's final draft

    From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation

    Full text link
    Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization is based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.Comment: 18 pages, 4 figures, accepted for publication in Scientific Programmin

    DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines

    Get PDF
    Integrated data analysis (IDA) pipelines—that combine data management (DM) and query processing, high-performance computing (HPC), and machine learning (ML) training and scoring—become increasingly common in practice. Interestingly, systems of these areas share many compilation and runtime techniques, and the used—increasingly heterogeneous—hardware infrastructure converges as well. Yet, the programming paradigms, cluster resource management, data formats and representations, as well as execution strategies differ substantially. DAPHNE is an open and extensible system infrastructure for such IDA pipelines, including language abstractions, compilation and runtime techniques, multi-level scheduling, hardware (HW) accelerators, and computational storage for increasing productivity and eliminating unnecessary overheads. In this paper, we make a case for IDA pipelines, describe the overall DAPHNE system architecture, its key components, and the design of a vectorized execution engine for computational storage, HW accelerators, as well as local and distributed operations. Preliminary experiments that compare DAPHNE with MonetDB, Pandas, DuckDB, and TensorFlow show promising results

    GekkoFS: A temporary burst buffer file system for HPC applications

    Get PDF
    Many scientific fields increasingly use high-performance computing (HPC) to process and analyze massive amounts of experimental data while storage systems in today’s HPC environments have to cope with new access patterns. These patterns include many metadata operations, small I/O requests, or randomized file I/O, while general-purpose parallel file systems have been optimized for sequential shared access to large files. Burst buffer file systems create a separate file system that applications can use to store temporary data. They aggregate node-local storage available within the compute nodes or use dedicated SSD clusters and offer a peak bandwidth higher than that of the backend parallel file system without interfering with it. However, burst buffer file systems typically offer many features that a scientific application, running in isolation for a limited amount of time, does not require. We present GekkoFS, a temporary, highly-scalable file system which has been specifically optimized for the aforementioned use cases. GekkoFS provides relaxed POSIX semantics which only offers features which are actually required by most (not all) applications. GekkoFS is, therefore, able to provide scalable I/O performance and reaches millions of metadata operations already for a small number of nodes, significantly outperforming the capabilities of common parallel file systems.Peer ReviewedPostprint (author's final draft

    Game of Templates. Deploying and (re-)using Virtualized Research Environments in High-Performance and High-Throughput Computing

    Get PDF
    The Virtual Open Science Collaboration Environment project worked on different use cases to evaluate the necessary steps for virtualization or containerization especially when considering the external dependencies of digital workflows. Virtualized Research Environments (VRE) can both help to broaden the user base of an HPC cluster like NEMO and offer new forms of packaging scientific workflows as well as managing software stacks. The eResearch initiative on VREs sponsored by the state of Baden-WĂĽrttemberg provided the necessary framework for both the researchers of various disciplines as well as the providers of (large-scale) compute infrastructures to define future operational models of HPC clusters and scientific clouds. In daily operations, VREs running on virtualization or containerization technologies such as OpenStack or Singularity help to disentangle the responsibilities regarding the software stacks needed to fulfill a certain task. Nevertheless, the reproduction of VREs as well as the provisioning of research data to be computed and stored afterward creates a couple of challenges which need to be solved beyond the traditional scientific computing models

    A Low Level Component Model enabling Resource Specialization of HPC Applications

    Get PDF
    Scientific applications are still getting more complex, e.g. to improve their accuracy by taking into account more phenomena. Moreover, computing infrastructures are continuing their fast evolution. Therefore, software engineering is becoming a major issue to achieve easiness of development, portability, simple maintenance, while achieving high performance. Software component model is a promising approach, which enables to manipulate the software architecture of an application. However, existing models do not capture enough resource specificities. This paper proposes a low level component model (L2C) that supports directly native connectors such as MPI, shared memory and method invocation. L2C is intended to be used as a back end by a ''compiler'' (such as HLCM) to generate an application assembly specific to a given machine. This paper shows on a typical domain decomposition use case that \llc can achieve the same performance as native implementations, while gaining benefits such as enabling resource specialization capabilities.Les applications scientifiques continuent de devenir de plus en plus complexes, par exemple pour améliorer leur précision en intégrant davantage de phénomènes à simuler. Par ailleurs, les infrastructures de calcul continuent leur rapide évolution. Ainsi, l'ingénierie logicielle devient un défi très important afin de permettre une facilité de développement, la portabilité des codes, et une maintenance acceptable tout en permettant de hautes performances. Les modèles de composants logiciels offrent une approche prometteuse en permettant de manipuler l'architecture logicielle d'une application. Cependant, les modèles existant ne permettent pas de capturer suffisamment les spécificités des ressources de calcul. Cet article propose un modèle de composant logiciel ''bas niveau'' (L2C) qui permet l'intègration native de connecteurs tels que MPI, la mémoire partagée ou l'invocation de méthode. L2C est destiné à être utilisé en tant que langage de sortie d'un ''compilateur'' (tel que HLCM) générant un assemblage d'une application spécifique à une machine et à une exécution. Cet article montre sur un cas d'étude typique de décomposition de domaines que L2C permet d'atteindre les même performances que les applications natives, tout en offrant des possibilités d'optimisation par rapport aux capacités des ressources

    Breast Histopathology with High-Performance Computing and Deep Learning

    Get PDF
    The increasingly intensive collection of digitalized images of tumor tissue over the last decade made histopathology a demanding application in terms of computational and storage resources. With images containing billions of pixels, the need for optimizing and adapting histopathology to large-scale data analysis is compelling. This paper presents a modular pipeline with three independent layers for the detection of tumoros regions in digital specimens of breast lymph nodes with deep learning models. Our pipeline can be deployed either on local machines or high-performance computing resources with a containerized approach. The need for expertise in high-performance computing is removed by the self-sufficient structure of Docker containers, whereas a large possibility for customization is left in terms of deep learning models and hyperparameters optimization. We show that by deploying the software layers in different infrastructures we optimize both the data preprocessing and the network training times, further increasing the scalability of the application to datasets of approximatively 43 million images. The code is open source and available on Github
    • …
    corecore