455 research outputs found

    Improving Data Management and Data Movement Efficiency in Hybrid Storage Systems

    Get PDF
    University of Minnesota Ph.D. dissertation.July 2017. Major: Computer Science. Advisor: David Du. 1 computer file (PDF); ix, 116 pages.In the big data era, large volumes of data being continuously generated drive the emergence of high performance large capacity storage systems. To reduce the total cost of ownership, storage systems are built in a more composite way with many different types of emerging storage technologies/devices including Storage Class Memory (SCM), Solid State Drives (SSD), Shingle Magnetic Recording (SMR), Hard Disk Drives (HDD), and even across off-premise cloud storage. To make better utilization of each type of storage, industries have provided multi-tier storage through dynamically placing hot data in the faster tiers and cold data in the slower tiers. Data movement happens between devices on one single device and as well as between devices connected via various networks. Toward improving data management and data movement efficiency in such hybrid storage systems, this work makes the following contributions: To bridge the giant semantic gap between applications and modern storage systems, passing a piece of tiny and useful information (I/O access hints) from upper layers to the block storage layer may greatly improve application performance or ease data management in heterogeneous storage systems. We present and develop a generic and flexible framework, called HintStor, to execute and evaluate various I/O access hints on heterogeneous storage systems with minor modifications to the kernel and applications. The design of HintStor contains a new application/user level interface, a file system plugin and a block storage data manager. With HintStor, storage systems composed of various storage devices can perform pre-devised data placement, space reallocation and data migration polices assisted by the added access hints. Each storage device/technology has its own unique price-performance tradeoffs and idiosyncrasies with respect to workload characteristics they prefer to support. To explore the internal access patterns and thus efficiently place data on storage systems with fully connected (i.e., data can move from one device to any other device instead of moving tier by tier) differential pools (each pool consists of storage devices of a particular type), we propose a chunk-level storage-aware workload analyzer framework, simplified as ChewAnalyzer. With ChewAnalzyer, the storage manager can adequately distribute and move the data chunks across different storage pools. To reduce the duplicate content transferred between local storage devices and devices in remote data centers, an inline Network Redundancy Elimination (NRE) process with Content-Defined Chunking (CDC) policy can obtain a higher Redundancy Elimination (RE) ratio but may suffer from a considerably higher computational requirement than fixed-size chunking. We build an inline NRE appliance which incorporates an improved FPGA based scheme to speed up CDC processing. To efficiently utilize the hardware resources, the whole NRE process is handled by a Virtualized NRE (VNRE) controller. The uniqueness of this VNRE that we developed lies in its ability to exploit the redundancy patterns of different TCP flows and customize the chunking process to achieve a higher RE ratio

    Improving Storage with Stackable Extensions

    Get PDF
    Storage is a central part of computing. Driven by exponentially increasing content generation rate and a widening performance gap between memory and secondary storage, researchers are in the perennial quest to push for further innovation. This has resulted in novel ways to “squeeze” more capacity and performance out of current and emerging storage technology. Adding intelligence and leveraging new types of storage devices has opened the door to a whole new class of optimizations to save cost, improve performance, and reduce energy consumption. In this dissertation, we first develop, analyze, and evaluate three storage exten- sions. Our first extension tracks application access patterns and writes data in the way individual applications most commonly access it to benefit from the sequential throughput of disks. Our second extension uses a lower power flash device as a cache to save energy and turn off the disk during idle periods. Our third extension is designed to leverage the characteristics of both disks and solid state devices by placing data in the most appropriate device to improve performance and save power. In developing these systems, we learned that extending the storage stack is a complex process. Implementing new ideas incurs a prolonged and cumbersome de- velopment process and requires developers to have advanced knowledge of the entire system to ensure that extensions accomplish their goal without compromising data recoverability. Futhermore, storage administrators are often reluctant to deploy specific storage extensions without understanding how they interact with other ex- tensions and if the extension ultimately achieves the intended goal. We address these challenges by using a combination of approaches. First, we simplify the stor- age extension development process with system-level infrastructure that implements core functionality commonly needed for storage extension development. Second, we develop a formal theory to assist administrators deploy storage extensions while guaranteeing that the given high level goals are satisfied. There are, however, some cases for which our theory is inconclusive. For such scenarios we present an experi- mental methodology that allows administrators to pick an extension that performs best for a given workload. Our evaluation demostrates the benefits of both the infrastructure and the formal theory

    Analyse des performances de stockage, en mémoire et sur les périphériques d'entrée/sortie, à partir d'une trace d'exécution

    Get PDF
    Le stockage des données est vital pour l’industrie informatique. Les supports de stockage doivent être rapides et fiables pour répondre aux demandes croissantes des entreprises. Les technologies de stockage peuvent être classifiées en deux catégories principales : stockage de masse et stockage en mémoire. Le stockage de masse permet de sauvegarder une grande quantité de données à long terme. Les données sont enregistrées localement sur des périphériques d’entrée/sortie, comme les disques durs (HDD) et les Solid-State Drive (SSD), ou en ligne sur des systèmes de stockage distribué. Le stockage en mémoire permet de garder temporairement les données nécessaires pour les programmes en cours d’exécution. La mémoire vive est caractérisée par sa rapidité d’accès, indispensable pour fournir rapidement les données à l’unité de calcul du processeur. Les systèmes d’exploitation utilisent plusieurs mécanismes pour gérer les périphériques de stockage, par exemple les ordonnanceurs de disque et les allocateurs de mémoire. Le temps de traitement d’une requête de stockage est affecté par l’interaction entre plusieurs soussystèmes, ce qui complique la tâche de débogage. Les outils existants, comme les outils d’étalonnage, permettent de donner une vague idée sur la performance globale du système, mais ne permettent pas d’identifier précisément les causes d’une mauvaise performance. L’analyse dynamique par trace d’exécution est très utile pour l’étude de performance des systèmes. Le traçage permet de collecter des données précises sur le fonctionnement du système, ce qui permet de détecter des problèmes de performance difficilement identifiables. L’objectif de cette thèse est de fournir un outil permettant d’analyser les performances de stockage, en mémoire et sur les périphériques d’entrée/sortie, en se basant sur les traces d’exécution. Les défis relevés par cet outil sont : collecter les données nécessaires à l’analyse depuis le noyau et les programmes en mode utilisateur, limiter le surcoût du traçage et la taille des traces générées, synchroniser les différentes traces, fournir des analyses multiniveau couvrant plusieurs aspects de la performance et enfin proposer des abstractions permettant aux utilisateurs de facilement comprendre les traces.----------ABSTRACT: Data storage is an essential resource for the computer industry. Storage devices must be fast and reliable to meet the growing demands of the data-driven economy. Storage technologies can be classified into two main categories: mass storage and main memory storage. Mass storage can store large amounts of data persistently. Data is saved locally on input/output devices, such as Hard Disk Drives (HDD) and Solid-State Drives (SSD), or remotely on distributed storage systems. Main memory storage temporarily holds the necessary data for running programs. Main memory is characterized by its high access speed, essential to quickly provide data to the Central Processing Unit (CPU). Operating systems use several mechanisms to manage storage devices, such as disk schedulers and memory allocators. The processing time of a storage request is affected by the interaction between several subsystems, which complicates the debugging task. Existing tools, such as benchmarking tools, provide a general idea of the overall system performance, but do not accurately identify the causes of poor performance. Dynamic analysis through execution tracing is a solution for the detailed runtime analysis of storage systems. Tracing collects precise data about the internal behavior of the system, which helps in detecting performance problems that are difficult to identify. The goal of this thesis is to provide a tool to analyze storage performance based on lowlevel trace events. The main challenges addressed by this tool are: collecting the required data using kernel and userspace tracing, limiting the overhead of tracing and the size of the generated traces, synchronizing the traces collected from different sources, providing multi-level analyses covering several aspects of storage performance, and lastly proposing abstractions allowing users to easily understand the traces. We carefully designed and inserted the instrumentation needed for the analyses. The tracepoints provide full visibility into the system and track the lifecycle of storage requests, from creation to processing. The Linux Trace Toolkit Next Generation (LTTng), a free and low-overhead tracer, is used for data collection. This tracer is characterized by its stability, and efficiency with highly parallel applications, thanks to the lock-free synchronization mechanisms used to update the content of the trace buffers. We also contributed to the creation of a patch that allows LTTng to capture the call stacks of userspace events

    Novel Architectures for Offloading and Accelerating Computations in Artificial Intelligence and Big Data

    Get PDF
    Due to the end of Moore's Law and Dennard Scaling, performance gains in general-purpose architectures have significantly slowed in recent years. While raising the number of cores has been a viable approach for further performance increases, Amdahl's Law and its implications on parallelization also limit further performance gains. Consequently, research has shifted towards different approaches, including domain-specific custom architectures tailored to specific workloads. This has led to a new golden age for computer architecture, as noted in the Turing Award Lecture by Hennessy and Patterson, which has spawned several new architectures and architectural advances specifically targeted at highly current workloads, including Machine Learning. This thesis introduces a hierarchy of architectural improvements ranging from minor incremental changes, such as High-Bandwidth Memory, to more complex architectural extensions that offload workloads from the general-purpose CPU towards more specialized accelerators. Finally, we introduce novel architectural paradigms, namely Near-Data or In-Network Processing, as the most complex architectural improvements. This cumulative dissertation then investigates several architectural improvements to accelerate Sum-Product Networks, a novel Machine Learning approach from the class of Probabilistic Graphical Models. Furthermore, we use these improvements as case studies to discuss the impact of novel architectures, showing that minor and major architectural changes can significantly increase performance in Machine Learning applications. In addition, this thesis presents recent works on Near-Data Processing, which introduces Smart Storage Devices as a novel architectural paradigm that is especially interesting in the context of Big Data. We discuss how Near-Data Processing can be applied to improve performance in different database settings by offloading database operations to smart storage devices. Offloading data-reductive operations, such as selections, reduces the amount of data transferred, thus improving performance and alleviating bandwidth-related bottlenecks. Using Near-Data Processing as a use-case, we also discuss how Machine Learning approaches, like Sum-Product Networks, can improve novel architectures. Specifically, we introduce an approach for offloading Cardinality Estimation using Sum-Product Networks that could enable more intelligent decision-making in smart storage devices. Overall, we show that Machine Learning can benefit from developing novel architectures while also showing that Machine Learning can be applied to improve the applications of novel architectures

    Data-intensive Systems on Modern Hardware : Leveraging Near-Data Processing to Counter the Growth of Data

    Get PDF
    Over the last decades, a tremendous change toward using information technology in almost every daily routine of our lives can be perceived in our society, entailing an incredible growth of data collected day-by-day on Web, IoT, and AI applications. At the same time, magneto-mechanical HDDs are being replaced by semiconductor storage such as SSDs, equipped with modern Non-Volatile Memories, like Flash, which yield significantly faster access latencies and higher levels of parallelism. Likewise, the execution speed of processing units increased considerably as nowadays server architectures comprise up to multiple hundreds of independently working CPU cores along with a variety of specialized computing co-processors such as GPUs or FPGAs. However, the burden of moving the continuously growing data to the best fitting processing unit is inherently linked to today’s computer architecture that is based on the data-to-code paradigm. In the light of Amdahl's Law, this leads to the conclusion that even with today's powerful processing units, the speedup of systems is limited since the fraction of parallel work is largely I/O-bound. Therefore, throughout this cumulative dissertation, we investigate the paradigm shift toward code-to-data, formally known as Near-Data Processing (NDP), which relieves the contention on the I/O bus by offloading processing to intelligent computational storage devices, where the data is originally located. Firstly, we identified Native Storage Management as the essential foundation for NDP due to its direct control of physical storage management within the database. Upon this, the interface is extended to propagate address mapping information and to invoke NDP functionality on the storage device. As the former can become very large, we introduce Physical Page Pointers as one novel NDP abstraction for self-contained immutable database objects. Secondly, the on-device navigation and interpretation of data are elaborated. Therefore, we introduce cross-layer Parsers and Accessors as another NDP abstraction that can be executed on the heterogeneous processing capabilities of modern computational storage devices. Thereby, the compute placement and resource configuration per NDP request is identified as a major performance criteria. Our experimental evaluation shows an improvement in the execution durations of 1.4x to 2.7x compared to traditional systems. Moreover, we propose a framework for the automatic generation of Parsers and Accessors on FPGAs to ease their application in NDP. Thirdly, we investigate the interplay of NDP and modern workload characteristics like HTAP. Therefore, we present different offloading models and focus on an intervention-free execution. By propagating the Shared State with the latest modifications of the database to the computational storage device, it is able to process data with transactional guarantees. Thus, we achieve to extend the design space of HTAP with NDP by providing a solution that optimizes for performance isolation, data freshness, and the reduction of data transfers. In contrast to traditional systems, we experience no significant drop in performance when an OLAP query is invoked but a steady and 30% faster throughput. Lastly, in-situ result-set management and consumption as well as NDP pipelines are proposed to achieve flexibility in processing data on heterogeneous hardware. As those produce final and intermediary results, we continue investigating their management and identified that an on-device materialization comes at a low cost but enables novel consumption modes and reuse semantics. Thereby, we achieve significant performance improvements of up to 400x by reusing once materialized results multiple times

    Application-aware optimization of Artificial Intelligence for deployment on resource constrained devices

    Get PDF
    Artificial intelligence (AI) is changing people's everyday life. AI techniques such as Deep Neural Networks (DNN) rely on heavy computational models, which are in principle designed to be executed on powerful HW platforms, such as desktop or server environments. However, the increasing need to apply such solutions in people's everyday life has encouraged the research for methods to allow their deployment on embedded, portable and stand-alone devices, such as mobile phones, which exhibit relatively low memory and computational resources. Such methods targets both the development of lightweight AI algorithms and their acceleration through dedicated HW. This thesis focuses on the development of lightweight AI solutions, with attention to deep neural networks, to facilitate their deployment on resource constrained devices. Focusing on the computer vision field, we show how putting together the self learning ability of deep neural networks with application-specific knowledge, in the form of feature engineering, it is possible to dramatically reduce the total memory and computational burden, thus allowing the deployment on edge devices. The proposed approach aims to be complementary to already existing application-independent network compression solutions. In this work three main DNN optimization goals have been considered: increasing speed and accuracy, allowing training at the edge, and allowing execution on a microcontroller. For each of these we deployed the resulting algorithm to the target embedded device and measured its performance

    Internet of Things From Hype to Reality

    Get PDF
    The Internet of Things (IoT) has gained significant mindshare, let alone attention, in academia and the industry especially over the past few years. The reasons behind this interest are the potential capabilities that IoT promises to offer. On the personal level, it paints a picture of a future world where all the things in our ambient environment are connected to the Internet and seamlessly communicate with each other to operate intelligently. The ultimate goal is to enable objects around us to efficiently sense our surroundings, inexpensively communicate, and ultimately create a better environment for us: one where everyday objects act based on what we need and like without explicit instructions
    • …
    corecore