406 research outputs found

    Founsure 1.0: An erasure code library with efficient repair and update features

    Get PDF
    Founsure is an open-source software library that implements a multi-dimensional graph-based erasure coding entirely based on fast exclusive OR (XOR) logic. Its implementation utilizes compiler optimizations and multi-threading to generate the right assembly code for the given multi-core CPU architecture with vector processing capabilities. Founsure possesses important features that shall find various applications in modern data storage, communication, and networked computer systems, in which the data needs protection against device, hardware, and node failures. As data size reached unprecedented levels, these systems have become hungry for network bandwidth, computational resources, and average consumed power. To address that, the proposed library provides a three-dimensional design space that trades off the computational complexity, coding overhead, and data/node repair bandwidth to meet different requirements of modern distributed data storage and processing systems. Founsure library enables efficient encoding, decoding, repairs/rebuilds, and updates while all the required data storage and computations are distributed across the network nodes.Turkiye Bilimsel ve Teknolojik Arastirma Kurumu (TUBITAK) Grant Number : 115C111 - 119E235WOS:000656825700019Scopus - Affiliation ID: 60105072Science Citation Index ExpandedQ3ArticleUluslararası işbirliği ile yapılmayan - HAYIRJanuary2021YÖK - 2020-2

    CORE: Augmenting Regenerating-Coding-Based Recovery for Single and Concurrent Failures in Distributed Storage Systems

    Full text link
    Data availability is critical in distributed storage systems, especially when node failures are prevalent in real life. A key requirement is to minimize the amount of data transferred among nodes when recovering the lost or unavailable data of failed nodes. This paper explores recovery solutions based on regenerating codes, which are shown to provide fault-tolerant storage and minimum recovery bandwidth. Existing optimal regenerating codes are designed for single node failures. We build a system called CORE, which augments existing optimal regenerating codes to support a general number of failures including single and concurrent failures. We theoretically show that CORE achieves the minimum possible recovery bandwidth for most cases. We implement CORE and evaluate our prototype atop a Hadoop HDFS cluster testbed with up to 20 storage nodes. We demonstrate that our CORE prototype conforms to our theoretical findings and achieves recovery bandwidth saving when compared to the conventional recovery approach based on erasure codes.Comment: 25 page

    GPUs as Storage System Accelerators

    Full text link
    Massively multicore processors, such as Graphics Processing Units (GPUs), provide, at a comparable price, a one order of magnitude higher peak performance than traditional CPUs. This drop in the cost of computation, as any order-of-magnitude drop in the cost per unit of performance for a class of system components, triggers the opportunity to redesign systems and to explore new ways to engineer them to recalibrate the cost-to-performance relation. This project explores the feasibility of harnessing GPUs' computational power to improve the performance, reliability, or security of distributed storage systems. In this context, we present the design of a storage system prototype that uses GPU offloading to accelerate a number of computationally intensive primitives based on hashing, and introduce techniques to efficiently leverage the processing power of GPUs. We evaluate the performance of this prototype under two configurations: as a content addressable storage system that facilitates online similarity detection between successive versions of the same file and as a traditional system that uses hashing to preserve data integrity. Further, we evaluate the impact of offloading to the GPU on competing applications' performance. Our results show that this technique can bring tangible performance gains without negatively impacting the performance of concurrently running applications.Comment: IEEE Transactions on Parallel and Distributed Systems, 201

    Data management for cloud supported cooperative driving

    Get PDF
    Tese de mestrado, Engenharia Informática (Arquitetura, Sistemas e Redes de Computadores) Universidade de Lisboa, Faculdade de Ciências, 2020The increasing number of technologies inserted into vehicles, allowed the common user to have access to a broad number of utilities that allows driving to be easier, safer and more economical. ABS, GPS, Bluetooth and onboard computer are some of the technologies associated with a recent vehicle. On more experimental ones there is obstacle detection, automatic braking and self-driving technologies, which can be supported by a wireless network connection to further improve their capabilities. That connection allows the transformation of each independent vehicle into nodes in an ad-hoc network. The current challenge is to connect all those vehicles and be able to provide the data needed for their correct functioning in a timely manner. That is the challenge this dissertation will seek to analyse: the possibility to create a reliable vehicular information system for cooperative driving based on the cloud. Cloud-based storage can support an ever changing number of vehicles while still satisfying scalability requirements and maintaining ease of access without the need to maintain a physical infrastructure, as that responsibility is laid upon the provider. To understand which service is the best to host the vehicular information system it was analyzed three services from Amazon Web Services (AWS): S3, EC2 and DynamoDB. Ease of utility, latency, scalability and cost were the main requirements tested as they are the most important aspects for a real-time vehicular information system for autonomous vehicles. After deciding which cloud service would be the most appropriate to implement the vehicular information system, two client models were created that fulfilled a set of requirements. They were based in an already existing algorithm named Two-Step Full Replication which utilizes a group of Key-Value Stores services from various clouds to simulate a shared-memory based on multi-writer, multi-reader (MWMR) registers. This algorithm tolerates Byzantine faults by using Byzantine quorum techniques and integrity and authenticity checks. It was defined and implemented the necessary changes on the algorithm to create usable a client for a vehicular information system. The first model called ”Atomic Snapshot Client”, uses the modified Two-Step Full Replication interface with the Atomic Snapshot algorithm. This model guarantees that the read of the system (snapshot) is done atomically without being adulterated by concurrent writes, sacrificing execution latency. The second model is a faster version of the first one with the objective of obtaining faster responses from the system without overly sacrificing data consistency, which is called ”Fast Snapshot Client”. The main change from the first one is the reduction of the guarantees of the atomic registers to regular ones making the reads (scan) and writes (update) simpler and faster, although removing the atomic snapshot feature. With the analysis of the data collected from experiments performed with this model it was possible to observe a relation between the increase of the scan latency time and the total time spent on the execution of the read and write operations on an application with various clients. To solve this problem a simple garbage collector was implemented, which cleans each register when the number of outdated writes that it contains goes over a specified threshold. This solution, although simple, proved to be effective to reduce each scan time. Finally, a vehicular information system based on the AWS S3 service was implemented. It is composed by two types of clients based on the Fast Snapshot Client, named vehicular client and calculator client. The two types of client work together, where the vehicular clients trade information with the calculator. The calculator client scans the registers of the vehicle clients and writes on its registers the processed data for each vehicular client. The vehicle clients need to write all the relevant data they gather and read the register of their respective calculator client and act according to the data read. Each of the clients was tested separately and analysed in order to discuss the viability of this system in a real-world application as well as possible changes to further improve it

    Mesh-based content routing using XML

    Get PDF

    Doctor of Philosophy

    Get PDF
    dissertationAs the base of the software stack, system-level software is expected to provide ecient and scalable storage, communication, security and resource management functionalities. However, there are many computationally expensive functionalities at the system level, such as encryption, packet inspection, and error correction. All of these require substantial computing power. What's more, today's application workloads have entered gigabyte and terabyte scales, which demand even more computing power. To solve the rapidly increased computing power demand at the system level, this dissertation proposes using parallel graphics pro- cessing units (GPUs) in system software. GPUs excel at parallel computing, and also have a much faster development trend in parallel performance than central processing units (CPUs). However, system-level software has been originally designed to be latency-oriented. GPUs are designed for long-running computation and large-scale data processing, which are throughput-oriented. Such mismatch makes it dicult to t the system-level software with the GPUs. This dissertation presents generic principles of system-level GPU computing developed during the process of creating our two general frameworks for integrating GPU computing in storage and network packet processing. The principles are generic design techniques and abstractions to deal with common system-level GPU computing challenges. Those principles have been evaluated in concrete cases including storage and network packet processing applications that have been augmented with GPU computing. The signicant performance improvement found in the evaluation shows the eectiveness and eciency of the proposed techniques and abstractions. This dissertation also presents a literature survey of the relatively young system-level GPU computing area, to introduce the state of the art in both applications and techniques, and also their future potentials

    The Design and Implementation of a Bytecode for Optimization on Heterogeneous Systems

    Get PDF
    As hardware architectures shift towards more heterogeneous platforms with different vari- eties of multi- and many-core processors and graphics processing units (GPUs) by various manufacturers, programmers need a way to write simple and highly optimized code without worrying about the specifics of the underlying hardware. To meet this need, I have designed a virtual machine and bytecode around the goal of optimized execution on highly variable, heterogeneous hardware, instead of having goals such as small bytecodes as was the ob- jective of the Java R Virtual Machine. The approach used here is to combine elements of the Dalvik R virtual machine with concepts from the OpenCL R heterogeneous computing platform, along with an annotation system so that the results of complex compile time analysis can be available to the Just-In-Time compiler. The annotation format is flexible so that the set of annotations can be expanded as the field of heterogeneous computing continues to grow. An initial implementation of this virtual machine was written in the Scala programming language and makes use of the Java bindings for OpenCL to execute code segments on a GPU. The implementation consists of an assembler that converts an assembly version of the bytecode into its binary representation and an interpreter that runs programs from the assembled binary. Because the bytecode contains valuable optimization information, decisions can be made at runtime to choose how best to execute code segments. To demonstrate this concept, the interpreter uses this information to produce OpenCL ker- nel code for specified bytecode blocks and then builds and executes these kernels to improve performance. This hybrid interpreter/Just-In-Time compiler serves as an initial implemen- tation of a virtual machine that provides optimized code tailored to the available hardware on which the application is running
    corecore