19 research outputs found

    ALCF I/O Data Repository

    Get PDF
    This report talks about the ALCF I/O Data Repository

    ECHOFS: a scheduler-guided temporary filesystem to leverage node-local NVMS

    Get PDF
    © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.The growth in data-intensive scientific applications poses strong demands on the HPC storage subsystem, as data needs to be copied from compute nodes to I/O nodes and vice versa for jobs to run. The emerging trend of adding denser, NVM-based burst buffers to compute nodes, however, offers the possibility of using these resources to build temporary file systems with specific I/O optimizations for a batch job. In this work, we present echofs, a temporary filesystem that coordinates with the job scheduler to preload a job's input files into node-local burst buffers. We present the results measured with NVM emulation, and different FS backends with DAX/FUSE on a local node, to show the benefits of our proposal and such coordination.This work was partially supported by the Spanish Ministry of Science and Innovation under the TIN2015–65316 grant, the Generalitat de Catalunya under contract 2014– SGR–1051, as well as the European Union’s Horizon 2020 Research and Innovation Programme, under Grant Agreement no. 671951 (NEXTGenIO). Source code available at https://github.com/bsc-ssrg/echofs.Peer ReviewedPostprint (author's final draft

    tf-Darshan: Understanding Fine-grained I/O Performance in Machine Learning Workloads

    Full text link
    Machine Learning applications on HPC systems have been gaining popularity in recent years. The upcoming large scale systems will offer tremendous parallelism for training through GPUs. However, another heavy aspect of Machine Learning is I/O, and this can potentially be a performance bottleneck. TensorFlow, one of the most popular Deep-Learning platforms, now offers a new profiler interface and allows instrumentation of TensorFlow operations. However, the current profiler only enables analysis at the TensorFlow platform level and does not provide system-level information. In this paper, we extend TensorFlow Profiler and introduce tf-Darshan, both a profiler and tracer, that performs instrumentation through Darshan. We use the same Darshan shared instrumentation library and implement a runtime attachment without using a system preload. We can extract Darshan profiling data structures during TensorFlow execution to enable analysis through the TensorFlow profiler. We visualize the performance results through TensorBoard, the web-based TensorFlow visualization tool. At the same time, we do not alter Darshan's existing implementation. We illustrate tf-Darshan by performing two case studies on ImageNet image and Malware classification. We show that by guiding optimization using data from tf-Darshan, we increase POSIX I/O bandwidth by up to 19% by selecting data for staging on fast tier storage. We also show that Darshan has the potential of being used as a runtime library for profiling and providing information for future optimization.Comment: Accepted for publication at the 2020 International Conference on Cluster Computing (CLUSTER 2020

    Storage QoS provisioning for execution programming of data-intensive applications

    Get PDF
    Abstract. In this paper a method for execution programming of data-intensive applications is presented. The method is based on storage Quality of Service (SQoS) provisioning. SQoS provisioning uses the semantic based storage monitoring based on a storage resources model and a storage performance management. Test results show the gain for the execution time when using the QStorMan toolkit which implements the presented method. Taking into account the SQoS provisioning opportunity on the one hand, and the increasingly growing user demands on the other hand, we believe that the execution programming of data-intensive applications can bring a new quality into the application execution

    Estimation du impact de l'utilisation d'I/O forwarding sur les performances des applications

    Get PDF
    In high performance computing architectures, the I/O forwarding technique is often used to alleviate contention in the access to the shared parallel file system servers. Intermediate I/O nodes are placed between compute nodes and these servers, and are responsible for forwarding requests. In this scenario, it is important to properly distribute the number of available I/O nodes among the running jobs to promote an efficient usage of these resources and improve I/O performance. However, the impact different numbers of I/O nodes have on an application bandwidth depends on its characteristics. In this report, we explore the idea of predicting application performance by extracting information from a coarse-grained aggregated trace from a previous execution, and then using this information to match each of the application's I/O phases to an equivalent benchmark, for which we could have performance results. We test this idea by applying it to five different applications over three case studies, and find a mean error of approximately 20%. We extensively discuss the obtained results and limitations to the approach, pointing at future work opportunities.Dans les plate-formes pour calcul hautes performances, la technique d’I/O forwarding est souvent utilisée pour atténuer les conflits d’accès aux serveurs du système de fichiers parallèle, qui sont partagés par les applications. Les nœuds d’I/O intermédiaires sont placés entre les nœuds de calcul et ces serveurs et sont responsables de la transmission des demandes. Dans ce scénario, il est important de répartir correctement le nombre de nœuds d’I/O disponibles parmi les jobs en cours d’exécution pour promouvoir une utilisation efficace de ces ressources et améliorer les performances d’I/O. Cependant, l’impact de différents nombres de nœuds intermédiaires sur la bande passante d’une application dépend de ses caractéristiques.Dans ce rapport, nous explorons l’idée de prédire les performances de l’application en extrayant des informations d’une trace agrégée à gros grain d’une exécution précédente, puis en utilisant ces informations pour faire correspondre chacune des phases d’I/O de l’application à un benchmark équivalent, pour lequel on pourrait avoir des résultats de performance. Nous testons cette idée en l’appliquant à cinq applications différentes sur trois études de cas, et trouvons une erreur moyenne d’environ 20%. Nous discutons longuement les résultats obtenus et les limites de l’approche, en indiquant des opportunités pour travaux futures

    Adaptive Request Scheduling for the I/O Forwarding Layer using Reinforcement Learning

    Get PDF
    International audienceI/O optimization techniques such as request scheduling can improve performance mainly for the access patterns they target, or they depend on the precise tune of parameters. In this paper, we propose an approach to adapt the I/O forwarding layer of HPC systems to the application access patterns by tuning a request scheduler. Our case study is the TWINS scheduling algorithm, where performance improvements depend on the timewindow parameter, which depends on the current workload. Our approach uses a reinforcement learning technique – contextual bandits – to make the system capable of learning the best parameter value to each access pattern during its execution, without a previous training phase. We evaluate our proposal and demonstrate it can achieve a precision of 88% on the parameter selection in the first hundreds of observations of an access pattern. After having observed an access pattern for a few minutes (not necessarily contiguously), we demonstrate that the system will be able to optimize its performance for the rest of the life of the system (years)

    Arbitration Policies for On-Demand User-Level I/O Forwarding on HPC Platforms

    Get PDF
    International audienceI/O forwarding is a well-established and widelyadopted technique in HPC to reduce contention in the access to storage servers and transparently improve I/O performance. Rather than having applications directly accessing the shared parallel file system, the forwarding technique defines a set of I/O nodes responsible for receiving application requests and forwarding them to the file system, thus reshaping the flow of requests. The typical approach is to statically assign I/O nodes to applications depending on the number of compute nodes they use, which is not always necessarily related to their I/O requirements. Thus, this approach leads to inefficient usage of these resources. This paper investigates arbitration policies based on the applications I/O demands, represented by their access patterns. We propose a policy based on the Multiple-Choice Knapsack problem that seeks to maximize global bandwidth by giving more I/O nodes to applications that will benefit the most. Furthermore, we propose a userlevel I/O forwarding solution as an on-demand service capable of applying different allocation policies at runtime for machines where this layer is not present. We demonstrate our approach's applicability through extensive experimentation and show it can transparently improve global I/O bandwidth by up to 85% in a live setup compared to the default static policy

    Dimensionnement de Burst-Buffers pour réduire la contention Entrées-Sorties

    Get PDF
    Burst-Buffers are high throughput and small size storage which are being used as an intermediate storage between the Parallel File System (Parallel File System) and the computational nodes of modern HPC systems. They can allow to hinder to contention to the Parallel File System, a shared resource whose read and write performance increase slower than processing power in HPC systems. A second usage is to accelerate data transfers and to hide the latency to the Parallel File System. In this paper, we concentrate on the first usage. We propose a model for Burst-Buffers and application transfers.We consider the problem of dimensioning and sharing the Burst-Buffers between several applications. This dimensioning can be done either dynamically or statically. The dynamic allocation considers that any application can use any available portion of the Burst-Buffers. The static allocation considers that when a new application enters the system, it is assigned some portion of the Burst-Buffers which cannot be used by the other applications until that application leaves the system and its data is purged from it. We show that the general sharing problem to guarantee fair performance for all applications is an NP-Complete problem. We give a polynomial time algorithms for the special case of finding the optimal buffer size such that no application is slowed down due to Parallel File System contention, both in the static and dynamic cases. Finally, we provide evaluations of our algorithms in realistic settings. We use those to discuss how to minimize the overhead of the static allocation of buffers compared to the dynamic allocation.Nous nous intéressons à l’utilisation de Burst-Buffers en temps qu’espace de stockage intermédiaire entre les nœuds de calcul et le Système de Fichiers Parallèles (PFS). Ce dimensionnement peut être statique (à l’arrivée d’une application dans le système), ou dynamique (en fonction des demandes Entrées-Sorties).Nous montrons que le problème général de partager équitablement les buffers entre applications est NP-complet. Nous montrons que dans le cas particulier où l’on cherche à minimiser la taille totale du buffer pour qu’aucune application ne soit ralentie est résolvable en temps polynomial. Pour résoudre ce problème nous proposons un programme linéaire.Finalement nous proposons des évaluations à taille de buffer fixé pour montrer la performance de certains algorithmes naifs communs
    corecore