88 research outputs found

    ECHOFS: a scheduler-guided temporary filesystem to leverage node-local NVMS

    Get PDF
    © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.The growth in data-intensive scientific applications poses strong demands on the HPC storage subsystem, as data needs to be copied from compute nodes to I/O nodes and vice versa for jobs to run. The emerging trend of adding denser, NVM-based burst buffers to compute nodes, however, offers the possibility of using these resources to build temporary file systems with specific I/O optimizations for a batch job. In this work, we present echofs, a temporary filesystem that coordinates with the job scheduler to preload a job's input files into node-local burst buffers. We present the results measured with NVM emulation, and different FS backends with DAX/FUSE on a local node, to show the benefits of our proposal and such coordination.This work was partially supported by the Spanish Ministry of Science and Innovation under the TIN2015–65316 grant, the Generalitat de Catalunya under contract 2014– SGR–1051, as well as the European Union’s Horizon 2020 Research and Innovation Programme, under Grant Agreement no. 671951 (NEXTGenIO). Source code available at https://github.com/bsc-ssrg/echofs.Peer ReviewedPostprint (author's final draft

    Sizing and Partitioning Strategies for Burst-Buffers to Reduce IO Contention

    Get PDF
    International audienceBurst-Buffers are high throughput and small size storage which are being used as an intermediate storage between the PFS (Parallel File System) and the computational nodes of modern HPC systems. They can allow to hinder to contention to the PFS, a shared resource whose read and write performance increase slower than processing power in HPC systems. A second usage is to accelerate data transfers and to hide the latency to the PFS. In this paper, we concentrate on the first usage. We propose a model for Burst-Buffers and application transfers. We consider the problem of dimensioning and sharing the Burst-Buffers between several applications. This dimensioning can be done either dynamically or statically. The dynamic allocation considers that any application can use any available portion of the Burst-Buffers. The static allocation considers that when a new application enters the system, it is assigned some portion of the Burst-Buffers, which cannot be used by the other applications until that application leaves the system and its data is purged from it. We show that the general sharing problem to guarantee fair performance for all applications is an NP-Complete problem. We propose a polynomial time algorithms for the special case of finding the optimal buffer size such that no application is slowed down due to PFS contention, both in the static and dynamic cases. Finally, we provide evaluations of our algorithms in realistic settings. We use those to discuss how to minimize the overhead of the static allocation of buffers compared to the dynamic allocation

    Exascale storage systems: an analytical study of expenses

    Get PDF
    The computational power and storage capability of supercomputers are growing at a different pace, with storage lagging behind; the widening gap necessitates new approaches to keep the investment and running costs for storage systems at bay. In this paper, we aim to unify previous models and compare different approaches for solving these problems. By extrapolating the characteristics of the German Climate Computing Center's previous supercomputers to the future, cost factors are identified and quantified in order to foster adequate research and development. Using models to estimate the execution costs of two prototypical use cases, we are discussing the potential of three concepts: re-computation, data deduplication and data compression

    Programming model abstractions for optimizing I/O intensive applications

    Get PDF
    This thesis contributes from the perspective of task-based programming models to the efforts of optimizing I/O intensive applications. Throughout this thesis, we propose programming model abstractions and mechanisms that target a twofold objective: from the one hand, improve the I/O and total performance of applications on nowadays complex storage infrastructures. From the other hand, achieve such performance improvement without increasing the complexity of applications programming. The following paragraphs briefly summarize each of our contributions. First, towards exploiting compute-I/O patterns of I/O intensive applications and transparently improving I/O and total performance, we propose a number of abstractions that we refer to as I/O Awareness abstractions. An I/O aware task-based programming model is able to separate the handling of I/O and computations by supporting I/O Tasks. The execution of such tasks can overlap with compute tasks execution. Moreover, we provide programming model support to improve I/O performance by addressing the issue of I/O congestion. This is achieved by using Storage Bandwidth Constraints to control the level of task parallelism. We support two types of such constraints: (i) Static storage bandwidth constraints that are manually set by application developers. (ii) Auto-tunable constraints that are automatically set and tuned throughout the execution of application. Second, in order to exploit the heterogeneity of modern storage systems to improve performance in a transparent manner, we propose a set of capabilities that we refer to as Storage heterogeneity Awareness. A storage-heterogeneity aware task-based programming model builds on the concepts and abstractions that are introduced in the first contribution to improve the I/O performance of applications on heterogeneous storage systems. More specifically, such programming models support the following features: (i) abstracting the heterogeneity of the storage devices and exposing them as one hierarchical storage resource. (ii) supporting dedicated I/O scheduling. (iii) Finally, we introduce a mechanism that automatically and periodically flushes obsolete data from higher storage layers to lower storage layers. Third, targeting increasing parallelism levels of applications, we propose a Hybrid Programming Model that combines task-based programming models and MPI. In this programming model, tasks are used to achieve coarse-grained parallelism on large-scale distributed infrastructures, whereas MPI is used to gain fine-grained parallelism by parallelizing tasks execution. Such a hybrid programming model offers the possibility to enable parallel I/O and high-level I/O libraries in tasks. We enable such a hybrid programming model by supporting Native MPI Tasks. These tasks are native to the programming model for two reasons: they execute task code as opposed to calling external MPI binaries or scripts. Also, the data transfers and input/output handling is done in a completely transparent manner to application developers. Therefore, increasing parallelism levels while easing the design and programming of applications. Finally, to exploit the inherent parallelism opportunities in applications and overlap computation with I/O, we propose an Eager mechanism for releasing data dependencies. Unlike the traditional approach for releasing dependencies, eagerly releasing data dependencies allows successor tasks to be released for execution as soon as their data dependencies are ready, without having to wait for predecessor task(s) to completely finish execution. In order to support the eager-release of data dependencies, we describe the following core modifications to the design of task-based programming models: (i) defining and managing data dependency relationships as parameter-aware dependencies (ii) a mechanism for notifying the programming model that an output data has been generated before the execution of the producer task ends.Aquesta tesi contribueix des de la perspectiva dels models de programació basats en tasques als esforços d’optimitzar les aplicacions intensives de I/O. Al llarg d'aquesta tesi, proposem abstraccions i mecanismes del model de programació que persegueixen un doble objectiu: per una banda, millorar la I/O i el rendiment total de les aplicacions a les complexes infraestructures d'emmagatzematge de l'actualitat. D'altra banda, aconsegueixi aquesta millora del rendiment sense augmentar la complexitat de la programació d'aplicacions. Els paràgrafs següents resumeixen cadascuna de les nostres contribucions. En primer lloc, proposem una sèrie d'abstraccions a què ens referim com a abstraccions de consciència de I/O. Un model de programació basat en tasques amb reconeixement d'I/O pot separar el maneig d'I/O i els càlculs en admetre Tasques d'I/O. L'execució d'aquestes tasques es pot superposar amb l'execució de tasques de càlcul. A més, proporcionem suport de model de programació per millorar el rendiment d'I/O en abordar el problema de la congestió d'I/O. Això s'aconsegueix mitjançant l'ús de restriccions d'amplada de banda d'emmagatzematge per controlar el nivell de paral·lelisme de tasques. Admetem dos tipus d'aquestes restriccions: estàtic i autoajustable. En segon lloc, proposem un conjunt de capacitats a què ens referim com a Consciència d'heterogeneïtat d'emmagatzematge. Un model de programació basat en tasques conscient de l'heterogeneïtat de l'emmagatzematge es basa en els conceptes i les abstraccions que s'introdueixen en la primera contribució per millorar el rendiment d'I/O de les aplicacions en sistemes d'emmagatzematge heterogenis. Més específicament, aquests models de programació admeten les característiques següents: (i) abstreure l'heterogeneïtat dels dispositius d'emmagatzematge i exposar-los com a recurs d'emmagatzematge jeràrquic. (ii) admetre la programació d'I/O dedicada. (iii) Finalment, presentem un mecanisme que descarrega automàticament i periòdicament les dades obsoletes de les capes d'emmagatzematge superiors a les capes d'emmagatzematge inferiors. En tercer lloc, proposem un model de programació híbrid que combina models de programació basats en tasques i MPI. En aquest model de programació, les tasques s'utilitzen per aconseguir un paral·lelisme de gra gruixut en infraestructures distribuïdes a gran escala, mentre que MPI es fa servir per obtenir un paral·lelisme de gra fi en paral·lelitzar l'execució de tasques. Un model d'aquest tipus de programació híbrid ofereix la possibilitat d'habilitar I/O paral·leles i biblioteques d'I/O d'alt nivell en tasques. Habilitem un model de programació híbrid d'aquest tipus en admetre tasques MPI natives que executen codi de tasca en lloc de trucar a binaris o scripts MPI externs. A més, la transferència de dades i el maneig d’entrada / sortida es realitza d’una manera completament transparent per als desenvolupadors d’aplicacions. Per tant, augmenta els nivells de paral·lelisme alhora que se'n facilita el disseny i la programació d'aplicacions. Finalment proposem un mecanisme Eager per alliberar dependències de dades. A diferència de l'enfocament tradicional per alliberar dependències, alliberar amb entusiasme les dependències de dades permet que les tasques successores s'alliberin per a la seva execució tan aviat com les dependències de dades estiguin llestes, sense haver d'esperar que les tasques predecessores acabin completament l'execució. Per tal de donar suport a l'alliberament ansiós de les dependències de dades, descrivim les següents modificacions centrals al disseny de models de programació basats en tasques: (i) definir i administrar les relacions de dependència de dades com a dependències conscients de paràmetres (ii ) un mecanisme per notificar la model de programació que s'ha generat una dada de sortida abans que finalitzi l'execució de la tasca de productor.Postprint (published version

    Exploring Scheduling for On-demand File Systems and Data Management within HPC Environments

    Get PDF

    Exploring Scheduling for On-demand File Systems and Data Management within HPC Environments

    Get PDF

    Dimensionnement de Burst-Buffers pour réduire la contention Entrées-Sorties

    Get PDF
    Burst-Buffers are high throughput and small size storage which are being used as an intermediate storage between the Parallel File System (Parallel File System) and the computational nodes of modern HPC systems. They can allow to hinder to contention to the Parallel File System, a shared resource whose read and write performance increase slower than processing power in HPC systems. A second usage is to accelerate data transfers and to hide the latency to the Parallel File System. In this paper, we concentrate on the first usage. We propose a model for Burst-Buffers and application transfers.We consider the problem of dimensioning and sharing the Burst-Buffers between several applications. This dimensioning can be done either dynamically or statically. The dynamic allocation considers that any application can use any available portion of the Burst-Buffers. The static allocation considers that when a new application enters the system, it is assigned some portion of the Burst-Buffers which cannot be used by the other applications until that application leaves the system and its data is purged from it. We show that the general sharing problem to guarantee fair performance for all applications is an NP-Complete problem. We give a polynomial time algorithms for the special case of finding the optimal buffer size such that no application is slowed down due to Parallel File System contention, both in the static and dynamic cases. Finally, we provide evaluations of our algorithms in realistic settings. We use those to discuss how to minimize the overhead of the static allocation of buffers compared to the dynamic allocation.Nous nous intéressons à l’utilisation de Burst-Buffers en temps qu’espace de stockage intermédiaire entre les nœuds de calcul et le Système de Fichiers Parallèles (PFS). Ce dimensionnement peut être statique (à l’arrivée d’une application dans le système), ou dynamique (en fonction des demandes Entrées-Sorties).Nous montrons que le problème général de partager équitablement les buffers entre applications est NP-complet. Nous montrons que dans le cas particulier où l’on cherche à minimiser la taille totale du buffer pour qu’aucune application ne soit ralentie est résolvable en temps polynomial. Pour résoudre ce problème nous proposons un programme linéaire.Finalement nous proposons des évaluations à taille de buffer fixé pour montrer la performance de certains algorithmes naifs communs

    ASCR/HEP Exascale Requirements Review Report

    Full text link
    This draft report summarizes and details the findings, results, and recommendations derived from the ASCR/HEP Exascale Requirements Review meeting held in June, 2015. The main conclusions are as follows. 1) Larger, more capable computing and data facilities are needed to support HEP science goals in all three frontiers: Energy, Intensity, and Cosmic. The expected scale of the demand at the 2025 timescale is at least two orders of magnitude -- and in some cases greater -- than that available currently. 2) The growth rate of data produced by simulations is overwhelming the current ability, of both facilities and researchers, to store and analyze it. Additional resources and new techniques for data analysis are urgently needed. 3) Data rates and volumes from HEP experimental facilities are also straining the ability to store and analyze large and complex data volumes. Appropriately configured leadership-class facilities can play a transformational role in enabling scientific discovery from these datasets. 4) A close integration of HPC simulation and data analysis will aid greatly in interpreting results from HEP experiments. Such an integration will minimize data movement and facilitate interdependent workflows. 5) Long-range planning between HEP and ASCR will be required to meet HEP's research needs. To best use ASCR HPC resources the experimental HEP program needs a) an established long-term plan for access to ASCR computational and data resources, b) an ability to map workflows onto HPC resources, c) the ability for ASCR facilities to accommodate workflows run by collaborations that can have thousands of individual members, d) to transition codes to the next-generation HPC platforms that will be available at ASCR facilities, e) to build up and train a workforce capable of developing and using simulations and analysis to support HEP scientific research on next-generation systems.Comment: 77 pages, 13 Figures; draft report, subject to further revisio