16 research outputs found

    Direct-pNFS: Scalable, transparent, and versatile access to parallel file systems

    Full text link
    Grid computations require global access to massive data stores. To meet this need, the GridNFS project aims to provide scalable, high-performance, transparent, and secure wide-area data management as well as a scalable and agile name space. While parallel file systems give high I/O throughput, they are highly specialized, have limited operating system and hardware platform support, and often lack strong security mechanisms. Remote data access tools such as NFS and GridFTP overcome some of these limitations, but fail to provide universal, transparent, and scalable remote data access. As part of GridNFS, this paper introduces Direct-pNFS, which builds on the NFSv4.1 protocol to meet a key challenge in accessing remote parallel file systems: high-performance and scalable data access without sacrificing transparency, security, orportability. Experiments with Direct-pNFS demonstrate I/O throughput that equals or out performs the exported parallel file system across a range of workloads.http://deepblue.lib.umich.edu/bitstream/2027.42/107917/1/citi-tr-07-2.pd

    Parallel NFS Block Layout Module for Linux

    Full text link
    This position statement presents CITI's Linux prototype of NFSv4.1 pNFS client block layout module and reviews our implementation approach. CITI's prototype implements the IETF draft specification draft-ietf-nfsv4-pnfs-block and is one of three layout modules being developed along with the Linux pNFS generic client, which implements the draft-ietf-nfsv4-minorversion1 specification. The block layout module provides for an I/O data path over iSCSI directly to client SCSI devices identified by the pNFS block server.http://deepblue.lib.umich.edu/bitstream/2027.42/107895/1/citi-tr-08-1.pd

    Object-Based Parallel NFS (pNFS) Operations

    Full text link

    Comparison of Parallel File Systems

    Get PDF
    Cílem této práce bylo porovnání několika zástupců paralelních souborových systému. Zaměřuje se na výkonnost operací čtení/zápis v závislosti na různých typech zátěže a spolehlivost systémů, která sleduje odolnost proti výpadkům a ztrátě dat. První část práce je věnována studiu osmi nejrozšířenějších zástupců paralelních souborových systému. Z nich byly vybráni tři konkrétní systémy pro podrobnější zkoumání: Lustre, GlusterFS a CephFS. Za účelem jejich otestování byla navržena a implementována automatizovaná sada testovacích úloh. Vybrané systémy byly postupně nainstalovány na testovací hardware a otestovány pomocí připravené testovací sady. Naměřené výsledky byly popsány a vzájemně porovnány. Závěrečná část práce hodnotí vlastnosti zvolených systémů a jejich vhodnost pro konkrétní typy zátěže.The goal of this thesis was to explore several parallel file systems, and to evaluate their performance under various conditions. The main focus of this assessment were read and write speeds in different workloads, the reliability of each system, and also their ability to protect from data loss. Initially, this thesis introduces eight of the most commonly used parallel file systems. From these, three were selected for further testing: Lustre, GlusterFS, and CephFS. To be able to evaluate their performance accurately, a suite of automated tests was developed. These benchmarks were run for each individual file system in our testing lab. The final part of this work evaluates the results and discusses the features of each of the file systems and their suitability for particular workloads.

    Service-oriented models for audiovisual content storage

    No full text
    What are the important topics to understand if involved with storage services to hold digital audiovisual content? This report takes a look at how content is created and moves into and out of storage; the storage service value networks and architectures found now and expected in the future; what sort of data transfer is expected to and from an audiovisual archive; what transfer protocols to use; and a summary of security and interface issues

    HMC-Based Accelerator Design For Compressed Deep Neural Networks

    Get PDF
    Deep Neural Networks (DNNs) offer remarkable performance of classifications and regressions in many high dimensional problems and have been widely utilized in real-word cognitive applications. In DNN applications, high computational cost of DNNs greatly hinder their deployment in resource-constrained applications, real-time systems and edge computing platforms. Moreover, energy consumption and performance cost of moving data between memory hierarchy and computational units are higher than that of the computation itself. To overcome the memory bottleneck, data locality and temporal data reuse are improved in accelerator design. In an attempt to further improve data locality, memory manufacturers have invented 3D-stacked memory where multiple layers of memory arrays are stacked on top of each other. Inherited from the concept of Process-In-Memory (PIM), some 3D-stacked memory architectures also include a logic layer that can integrate general-purpose computational logic directly within main memory to take advantages of high internal bandwidth during computation. In this dissertation, we are going to investigate hardware/software co-design for neural network accelerator. Specifically, we introduce a two-phase filter pruning framework for model compression and an accelerator tailored for efficient DNN execution on HMC, which can dynamically offload the primitives and functions to PIM logic layer through a latency-aware scheduling controller. In our compression framework, we formulate filter pruning process as an optimization problem and propose a filter selection criterion measured by conditional entropy. The key idea of our proposed approach is to establish a quantitative connection between filters and model accuracy. We define the connection as conditional entropy over filters in a convolutional layer, i.e., distribution of entropy conditioned on network loss. Based on the definition, different pruning efficiencies of global and layer-wise pruning strategies are compared, and two-phase pruning method is proposed. The proposed pruning method can achieve a reduction of 88% filters and 46% inference time reduction on VGG16 within 2% accuracy degradation. In this dissertation, we are going to investigate hardware/software co-design for neural network accelerator. Specifically, we introduce a two-phase filter pruning framework for model compres- sion and an accelerator tailored for efficient DNN execution on HMC, which can dynamically offload the primitives and functions to PIM logic layer through a latency-aware scheduling con- troller. In our compression framework, we formulate filter pruning process as an optimization problem and propose a filter selection criterion measured by conditional entropy. The key idea of our proposed approach is to establish a quantitative connection between filters and model accuracy. We define the connection as conditional entropy over filters in a convolutional layer, i.e., distribution of entropy conditioned on network loss. Based on the definition, different pruning efficiencies of global and layer-wise pruning strategies are compared, and two-phase pruning method is proposed. The proposed pruning method can achieve a reduction of 88% filters and 46% inference time reduction on VGG16 within 2% accuracy degradation

    Deployment of NFV and SFC scenarios

    Get PDF
    Aquest ítem conté el treball original, defensat públicament amb data de 24 de febrer de 2017, així com una versió millorada del mateix amb data de 28 de febrer de 2017. Els canvis introduïts a la segona versió són 1) correcció d'errades 2) procediment del darrer annex.Telecommunications services have been traditionally designed linking hardware devices and providing mechanisms so that they can interoperate. Those devices are usually specific to a single service and are based on proprietary technology. On the other hand, the current model works by defining standards and strict protocols to achieve high levels of quality and reliability which have defined the carrier-class provider environment. Provisioning new services represent challenges at different levels because inserting the required devices involve changes in the network topology. This leads to slow deployment times and increased operational costs. To overcome the current burdens network function installation and insertion processes into the current service topology needs to be streamlined to allow greater flexibility. The current service provider model has been disrupted by the over-the-top Internet content providers (Facebook, Netflix, etc.), with short product cycles and fast development pace of new services. The content provider irruption has meant a competition and stress over service providers' infrastructure and has forced telco companies to research new technologies to recover market share with flexible and revenue-generating services. Network Function Virtualization (NFV) and Service Function Chaining (SFC) are some of the initiatives led by the Communication Service Providers to regain the lost leadership. This project focuses on experimenting with some of these already available new technologies, which are expected to be the foundation of the new network paradigms (5G, IOT) and support new value-added services over cost-efficient telecommunication infrastructures. Specifically, SFC scenarios have been deployed with Open Platform for NFV (OPNFV), a Linux Foundation project. Some use cases of the NFV technology are demonstrated applied to teaching laboratories. Although the current implementation does not achieve a production degree of reliability, it provides a suitable environment for the development of new functional improvements and evaluation of the performance of virtualized network infrastructures

    Analyse des performances de stockage, en mémoire et sur les périphériques d'entrée/sortie, à partir d'une trace d'exécution

    Get PDF
    Le stockage des données est vital pour l’industrie informatique. Les supports de stockage doivent être rapides et fiables pour répondre aux demandes croissantes des entreprises. Les technologies de stockage peuvent être classifiées en deux catégories principales : stockage de masse et stockage en mémoire. Le stockage de masse permet de sauvegarder une grande quantité de données à long terme. Les données sont enregistrées localement sur des périphériques d’entrée/sortie, comme les disques durs (HDD) et les Solid-State Drive (SSD), ou en ligne sur des systèmes de stockage distribué. Le stockage en mémoire permet de garder temporairement les données nécessaires pour les programmes en cours d’exécution. La mémoire vive est caractérisée par sa rapidité d’accès, indispensable pour fournir rapidement les données à l’unité de calcul du processeur. Les systèmes d’exploitation utilisent plusieurs mécanismes pour gérer les périphériques de stockage, par exemple les ordonnanceurs de disque et les allocateurs de mémoire. Le temps de traitement d’une requête de stockage est affecté par l’interaction entre plusieurs soussystèmes, ce qui complique la tâche de débogage. Les outils existants, comme les outils d’étalonnage, permettent de donner une vague idée sur la performance globale du système, mais ne permettent pas d’identifier précisément les causes d’une mauvaise performance. L’analyse dynamique par trace d’exécution est très utile pour l’étude de performance des systèmes. Le traçage permet de collecter des données précises sur le fonctionnement du système, ce qui permet de détecter des problèmes de performance difficilement identifiables. L’objectif de cette thèse est de fournir un outil permettant d’analyser les performances de stockage, en mémoire et sur les périphériques d’entrée/sortie, en se basant sur les traces d’exécution. Les défis relevés par cet outil sont : collecter les données nécessaires à l’analyse depuis le noyau et les programmes en mode utilisateur, limiter le surcoût du traçage et la taille des traces générées, synchroniser les différentes traces, fournir des analyses multiniveau couvrant plusieurs aspects de la performance et enfin proposer des abstractions permettant aux utilisateurs de facilement comprendre les traces.----------ABSTRACT: Data storage is an essential resource for the computer industry. Storage devices must be fast and reliable to meet the growing demands of the data-driven economy. Storage technologies can be classified into two main categories: mass storage and main memory storage. Mass storage can store large amounts of data persistently. Data is saved locally on input/output devices, such as Hard Disk Drives (HDD) and Solid-State Drives (SSD), or remotely on distributed storage systems. Main memory storage temporarily holds the necessary data for running programs. Main memory is characterized by its high access speed, essential to quickly provide data to the Central Processing Unit (CPU). Operating systems use several mechanisms to manage storage devices, such as disk schedulers and memory allocators. The processing time of a storage request is affected by the interaction between several subsystems, which complicates the debugging task. Existing tools, such as benchmarking tools, provide a general idea of the overall system performance, but do not accurately identify the causes of poor performance. Dynamic analysis through execution tracing is a solution for the detailed runtime analysis of storage systems. Tracing collects precise data about the internal behavior of the system, which helps in detecting performance problems that are difficult to identify. The goal of this thesis is to provide a tool to analyze storage performance based on lowlevel trace events. The main challenges addressed by this tool are: collecting the required data using kernel and userspace tracing, limiting the overhead of tracing and the size of the generated traces, synchronizing the traces collected from different sources, providing multi-level analyses covering several aspects of storage performance, and lastly proposing abstractions allowing users to easily understand the traces. We carefully designed and inserted the instrumentation needed for the analyses. The tracepoints provide full visibility into the system and track the lifecycle of storage requests, from creation to processing. The Linux Trace Toolkit Next Generation (LTTng), a free and low-overhead tracer, is used for data collection. This tracer is characterized by its stability, and efficiency with highly parallel applications, thanks to the lock-free synchronization mechanisms used to update the content of the trace buffers. We also contributed to the creation of a patch that allows LTTng to capture the call stacks of userspace events

    Benchmarking Hadoop performance on different distributed storage systems

    Get PDF
    Distributed storage systems have been in place for years, and have undergone significant changes in architecture to ensure reliable storage of data in a cost-effective manner. With the demand for data increasing, there has been a shift from disk-centric to memory-centric computing - the focus is on saving data in memory rather than on the disk. The primary motivation for this is the increased speed of data processing. This could, however, mean a change in the approach to providing the necessary fault-tolerance - instead of data replication, other techniques may be considered. One example of an in-memory distributed storage system is Tachyon. Instead of replicating data files in memory, Tachyon provides fault-tolerance by maintaining a record of the operations needed to generate the data files. These operations are replayed if the files are lost. This approach is termed lineage. Tachyon is already deployed by many well-known companies. This thesis work compares the storage performance of Tachyon with that of the on-disk storage systems HDFS and Ceph. After studying the architectures of well-known distributed storage systems, the major contribution of the work is to integrate Tachyon with Ceph as an underlayer storage system, and understand how this affects its performance, and how to tune Tachyon to extract maximum performance out of it
    corecore