588 research outputs found

    고성능 컴퓨팅 시스템에서 버스트 버퍼를 위한 I/O 분리 기법의 실증적 구현

    Get PDF
    학위논문(석사)--서울대학교 대학원 :공과대학 컴퓨터공학부,2019. 8. 엄현상.To meet the exascale I/O requirements in the High-Performance Computing (HPC), a new I/O subsystem, named Burst Buffer, based on non-volatile memory, has been developed. However, the diverse HPC workloads and the bursty I/O pattern cause severe data fragmentation to SSDs, which creates the need for expensive garbage collection (GC) and also increase the number of bytes actually written to SSD. The new multi-stream feature in SSDs offers an option to reduce the cost of garbage collection. In this paper, we leverage this multi-stream feature to group the I/O streams based on the user IDs and implement this strategy in a burst buffer we call BIOS, short for Burst Buffer with an I/O Separation scheme. Furthermore, to optimize the I/O separation scheme in burst buffer environments, we propose a stream-aware scheduling policy based on burst buffer pools in workload manager and implement the real burst buffer system, BIOS framework, by integrating the BIOS with workload manager. We evaluate the BIOS and framework with a burst buffer I/O traces from Cori Supercomputer including a diverse set of applications. We also disclose and analyze the benefits and limitations of using I/O separation scheme in HPC systems. Experimental results show that the BIOS could improve the performance by 1.44× on average and reduce the Write Amplification Factor (WAF) by up to 1.20×, and prove that the framework can keep on the benefits of the I/O separation scheme in the HPC environment.Abstract Introduction 1 Background and Challenges 5 Burst Buffer 5 Write Amplification in SSDs 6 Multi-streamed SSD 7 Challenges of Multi-stream Feature in Burst Buffers 7 I/O Separation Scheme in Burst Buffer 10 Stream Allocation Criteria 10 Implementation 12 Limitations of User ID-based Stream Allocation 14 BIOS Framework 15 Support in Workload Manager 15 Burst Buffer Pools 16 Stream-Aware Scheduling Policy 18 Workflow of BIOS Framework 20 Evaluation 21 Experiment Setup 21 Evaluation with Synthetic Workload 21 Evaluation with HPC Applications 25 Evaluation with Emulated Workload 27 Evaluation with Different Striping Configuration 29 Evaluation on BIOS Framework 30 Summary and Lessons Learned 33 An I/O Separation Scheme in Burst Buffer 33 Evaluation with Synthetic Workload 33 Evaluation with HPC Applications 33 Evaluation with Emulated Workload 34 Evaluation with Striping Configurations 34 A BIOS Framework 34 Evaluation with Real Burst Buffer Environments 34 Discussion 36 Limited Number of Nodes 36 Advanced BIOS Framework 37 Related work 38 Conclusions 40 Bibliography 42 초록 48Maste

    GekkoFS: A temporary burst buffer file system for HPC applications

    Get PDF
    Many scientific fields increasingly use high-performance computing (HPC) to process and analyze massive amounts of experimental data while storage systems in today’s HPC environments have to cope with new access patterns. These patterns include many metadata operations, small I/O requests, or randomized file I/O, while general-purpose parallel file systems have been optimized for sequential shared access to large files. Burst buffer file systems create a separate file system that applications can use to store temporary data. They aggregate node-local storage available within the compute nodes or use dedicated SSD clusters and offer a peak bandwidth higher than that of the backend parallel file system without interfering with it. However, burst buffer file systems typically offer many features that a scientific application, running in isolation for a limited amount of time, does not require. We present GekkoFS, a temporary, highly-scalable file system which has been specifically optimized for the aforementioned use cases. GekkoFS provides relaxed POSIX semantics which only offers features which are actually required by most (not all) applications. GekkoFS is, therefore, able to provide scalable I/O performance and reaches millions of metadata operations already for a small number of nodes, significantly outperforming the capabilities of common parallel file systems.Peer ReviewedPostprint (author's final draft

    Programming model abstractions for optimizing I/O intensive applications

    Get PDF
    This thesis contributes from the perspective of task-based programming models to the efforts of optimizing I/O intensive applications. Throughout this thesis, we propose programming model abstractions and mechanisms that target a twofold objective: from the one hand, improve the I/O and total performance of applications on nowadays complex storage infrastructures. From the other hand, achieve such performance improvement without increasing the complexity of applications programming. The following paragraphs briefly summarize each of our contributions. First, towards exploiting compute-I/O patterns of I/O intensive applications and transparently improving I/O and total performance, we propose a number of abstractions that we refer to as I/O Awareness abstractions. An I/O aware task-based programming model is able to separate the handling of I/O and computations by supporting I/O Tasks. The execution of such tasks can overlap with compute tasks execution. Moreover, we provide programming model support to improve I/O performance by addressing the issue of I/O congestion. This is achieved by using Storage Bandwidth Constraints to control the level of task parallelism. We support two types of such constraints: (i) Static storage bandwidth constraints that are manually set by application developers. (ii) Auto-tunable constraints that are automatically set and tuned throughout the execution of application. Second, in order to exploit the heterogeneity of modern storage systems to improve performance in a transparent manner, we propose a set of capabilities that we refer to as Storage heterogeneity Awareness. A storage-heterogeneity aware task-based programming model builds on the concepts and abstractions that are introduced in the first contribution to improve the I/O performance of applications on heterogeneous storage systems. More specifically, such programming models support the following features: (i) abstracting the heterogeneity of the storage devices and exposing them as one hierarchical storage resource. (ii) supporting dedicated I/O scheduling. (iii) Finally, we introduce a mechanism that automatically and periodically flushes obsolete data from higher storage layers to lower storage layers. Third, targeting increasing parallelism levels of applications, we propose a Hybrid Programming Model that combines task-based programming models and MPI. In this programming model, tasks are used to achieve coarse-grained parallelism on large-scale distributed infrastructures, whereas MPI is used to gain fine-grained parallelism by parallelizing tasks execution. Such a hybrid programming model offers the possibility to enable parallel I/O and high-level I/O libraries in tasks. We enable such a hybrid programming model by supporting Native MPI Tasks. These tasks are native to the programming model for two reasons: they execute task code as opposed to calling external MPI binaries or scripts. Also, the data transfers and input/output handling is done in a completely transparent manner to application developers. Therefore, increasing parallelism levels while easing the design and programming of applications. Finally, to exploit the inherent parallelism opportunities in applications and overlap computation with I/O, we propose an Eager mechanism for releasing data dependencies. Unlike the traditional approach for releasing dependencies, eagerly releasing data dependencies allows successor tasks to be released for execution as soon as their data dependencies are ready, without having to wait for predecessor task(s) to completely finish execution. In order to support the eager-release of data dependencies, we describe the following core modifications to the design of task-based programming models: (i) defining and managing data dependency relationships as parameter-aware dependencies (ii) a mechanism for notifying the programming model that an output data has been generated before the execution of the producer task ends.Aquesta tesi contribueix des de la perspectiva dels models de programació basats en tasques als esforços d’optimitzar les aplicacions intensives de I/O. Al llarg d'aquesta tesi, proposem abstraccions i mecanismes del model de programació que persegueixen un doble objectiu: per una banda, millorar la I/O i el rendiment total de les aplicacions a les complexes infraestructures d'emmagatzematge de l'actualitat. D'altra banda, aconsegueixi aquesta millora del rendiment sense augmentar la complexitat de la programació d'aplicacions. Els paràgrafs següents resumeixen cadascuna de les nostres contribucions. En primer lloc, proposem una sèrie d'abstraccions a què ens referim com a abstraccions de consciència de I/O. Un model de programació basat en tasques amb reconeixement d'I/O pot separar el maneig d'I/O i els càlculs en admetre Tasques d'I/O. L'execució d'aquestes tasques es pot superposar amb l'execució de tasques de càlcul. A més, proporcionem suport de model de programació per millorar el rendiment d'I/O en abordar el problema de la congestió d'I/O. Això s'aconsegueix mitjançant l'ús de restriccions d'amplada de banda d'emmagatzematge per controlar el nivell de paral·lelisme de tasques. Admetem dos tipus d'aquestes restriccions: estàtic i autoajustable. En segon lloc, proposem un conjunt de capacitats a què ens referim com a Consciència d'heterogeneïtat d'emmagatzematge. Un model de programació basat en tasques conscient de l'heterogeneïtat de l'emmagatzematge es basa en els conceptes i les abstraccions que s'introdueixen en la primera contribució per millorar el rendiment d'I/O de les aplicacions en sistemes d'emmagatzematge heterogenis. Més específicament, aquests models de programació admeten les característiques següents: (i) abstreure l'heterogeneïtat dels dispositius d'emmagatzematge i exposar-los com a recurs d'emmagatzematge jeràrquic. (ii) admetre la programació d'I/O dedicada. (iii) Finalment, presentem un mecanisme que descarrega automàticament i periòdicament les dades obsoletes de les capes d'emmagatzematge superiors a les capes d'emmagatzematge inferiors. En tercer lloc, proposem un model de programació híbrid que combina models de programació basats en tasques i MPI. En aquest model de programació, les tasques s'utilitzen per aconseguir un paral·lelisme de gra gruixut en infraestructures distribuïdes a gran escala, mentre que MPI es fa servir per obtenir un paral·lelisme de gra fi en paral·lelitzar l'execució de tasques. Un model d'aquest tipus de programació híbrid ofereix la possibilitat d'habilitar I/O paral·leles i biblioteques d'I/O d'alt nivell en tasques. Habilitem un model de programació híbrid d'aquest tipus en admetre tasques MPI natives que executen codi de tasca en lloc de trucar a binaris o scripts MPI externs. A més, la transferència de dades i el maneig d’entrada / sortida es realitza d’una manera completament transparent per als desenvolupadors d’aplicacions. Per tant, augmenta els nivells de paral·lelisme alhora que se'n facilita el disseny i la programació d'aplicacions. Finalment proposem un mecanisme Eager per alliberar dependències de dades. A diferència de l'enfocament tradicional per alliberar dependències, alliberar amb entusiasme les dependències de dades permet que les tasques successores s'alliberin per a la seva execució tan aviat com les dependències de dades estiguin llestes, sense haver d'esperar que les tasques predecessores acabin completament l'execució. Per tal de donar suport a l'alliberament ansiós de les dependències de dades, descrivim les següents modificacions centrals al disseny de models de programació basats en tasques: (i) definir i administrar les relacions de dependència de dades com a dependències conscients de paràmetres (ii ) un mecanisme per notificar la model de programació que s'ha generat una dada de sortida abans que finalitzi l'execució de la tasca de productor.Postprint (published version
    corecore