242 research outputs found

    A survey and classification of storage deduplication systems

    Get PDF
    The automatic elimination of duplicate data in a storage system commonly known as deduplication is increasingly accepted as an effective technique to reduce storage costs. Thus, it has been applied to different storage types, including archives and backups, primary storage, within solid state disks, and even to random access memory. Although the general approach to deduplication is shared by all storage types, each poses specific challenges and leads to different trade-offs and solutions. This diversity is often misunderstood, thus underestimating the relevance of new research and development. The first contribution of this paper is a classification of deduplication systems according to six criteria that correspond to key design decisions: granularity, locality, timing, indexing, technique, and scope. This classification identifies and describes the different approaches used for each of them. As a second contribution, we describe which combinations of these design decisions have been proposed and found more useful for challenges in each storage type. Finally, outstanding research challenges and unexplored design points are identified and discussed.This work is funded by the European Regional Development Fund (EDRF) through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the Fundacao para a Ciencia e a Tecnologia (FCT; Portuguese Foundation for Science and Technology) within project RED FCOMP-01-0124-FEDER-010156 and the FCT by PhD scholarship SFRH-BD-71372-2010

    Letter from the Special Issue Editor

    Get PDF
    Editorial work for DEBULL on a special issue on data management on Storage Class Memory (SCM) technologies

    Architecture Strategies for Cyber-Foraging: Preliminary Results from a Systematic Literature Review

    Get PDF
    Mobile devices have become for many the preferred way of interacting with the Internet, social media and the enterprise. However, mobile devices still do not have the computing power and battery life that will allow them to perform effectively over long periods of time or for executing applications that require extensive communication or computation, or low latency. Cyber-foraging is a technique to enable mobile devices to extend their computing power and storage by offloading computation or data to more powerful servers located in the cloud or in single-hop proximity. This paper presents the preliminary results of a systematic literature review (SLR) on architectures that support cyber-foraging. The preliminary results show that this is an area with many opportunities for research that will enable cyber-foraging solutions to become widely adopted as a way to support the mobile applications of the present and the future

    Structured P2P Technologies for Distributed Command and Control

    Get PDF
    The utility of Peer-to-Peer (P2P) systems extends far beyond traditional file sharing. This paper provides an overview of how P2P systems are capable of providing robust command and control for Distributed Multi-Agent Systems (DMASs). Specifically, this article presents the evolution of P2P architectures to date by discussing supporting technologies and applicability of each generation of P2P systems. It provides a detailed survey of fundamental design approaches found in modern large-scale P2P systems highlighting design considerations for building and deploying scalable P2P applications. The survey includes unstructured P2P systems, content retrieval systems, communications structured P2P systems, flat structured P2P systems and finally Hierarchical Peer-to-Peer (HP2P) overlays. It concludes with a presentation of design tradeoffs and opportunities for future research into P2P overlay systems

    Kernel-space inline deduplication file systems for virtual machine image storage.

    Get PDF
    從文件系統設計的角度,我們探索了利用重復數據删除技術來消除硬盤陣列存儲設備當中的重復數據。我們提出了ScaleDFS,一個重復數據删除技術的文件系統, 旨在硬盤陣列存儲設備上實現可擴展的吞吐性能。ScaleDFS有三個主要的特點。第一,利用多核CPU並行計算出用作識別重復數據的加密指紋,以提高寫入速度。第二,緩存曾經讀取過的重復數據塊,以顯著提高讀取速度。第三,優化用作查找指紋的內存數據結構,以更加節省內存。ScaleDFS是一個以Linux系統內核模塊開發的,與POSIX兼容的,可以用在一般低成本硬件配置上的文件系統。我們進行了一系列的微觀性能測試,以及用42個不同版本的Linux虛擬鏡像文件進行了宏觀性能測試。我們證實,ScaleDFS在磁盤陣列上比目前已有的開源重復數據删除文件系統擁有更好的讀寫性能。We explore the use of deduplication for eliminating the storage of redundant data in RAID from a file-system design perspective. We propose ScaleDFS, a deduplication file system that seeks to achieve scalable read/write throughput in RAID. ScaleDFS is built on three novel design features. First, we improve the write throughput by exploiting multiple CPU cores to parallelize the processing of the cryptographic fingerprints that are used to identify redundant data. Second, we improve the read throughput by specifically caching in memory the recently read blocks that have been deduplicated. Third, we reduce the memory usage by enhancing the data structures that are used for fingerprint lookups. ScaleDFS is implemented as a POSIX-compliant, kernel-space driver module that can be deployed in commodity hardware configurations. We conduct microbenchmark experiments using synthetic workloads, and macrobenchmark experiments using a dataset of 42 VM images of different Linux distributions. We show that ScaleDFS achieves higher read/write throughput than existing open-source deduplication file systems in RAID.Detailed summary in vernacular field only.Ma, Mingcao."October 2012."Thesis (M.Phil.)--Chinese University of Hong Kong, 2013.Includes bibliographical references (leaves 39-42).Abstracts also in Chinese.Chapter 1 --- Introduction --- p.2Chapter 2 --- Literature Review --- p.5Chapter 2.1 --- Backup systems --- p.5Chapter 2.2 --- Use of special hardware --- p.6Chapter 2.3 --- Scalable storage --- p.6Chapter 2.4 --- Inline DFSs --- p.6Chapter 2.5 --- VM image storage with deduplication --- p.7Chapter 3 --- ScaleDFS Background --- p.8Chapter 3.1 --- Spatial Locality of Fingerprint Placement --- p.9Chapter 3.2 --- Prefetching of Fingerprint Stores --- p.12Chapter 3.3 --- Journaling --- p.13Chapter 4 --- ScaleDFS Design --- p.15Chapter 4.1 --- Parallelizing Deduplication --- p.15Chapter 4.2 --- Caching Read Blocks --- p.17Chapter 4.3 --- Reducing Memory Usage --- p.17Chapter 5 --- Implementation --- p.20Chapter 5.1 --- Choice of Hash Function --- p.20Chapter 5.2 --- OpenStack Deployment --- p.21Chapter 6 --- Experiments --- p.23Chapter 6.1 --- Microbenchmarks --- p.23Chapter 6.2 --- OpenStack Deployment --- p.28Chapter 6.3 --- VM Image Operations in a RAID Setup --- p.33Chapter 7 --- Conclusions and FutureWork --- p.38Bibliography --- p.3

    Live migration of user environments across wide area networks

    Get PDF
    A complex challenge in mobile computing is to allow the user to migrate her highly customised environment while moving to a different location and to continue work without interruption. I motivate why this is a highly desirable capability and conduct a survey of the current approaches towards this goal and explain their limitations. I then propose a new architecture to support user mobility by live migration of a user’s operating system instance over the network. Previous work includes the Collective and Internet Suspend/Resume projects that have addressed migration of a user’s environment by suspending the running state and resuming it at a later time. In contrast to previous work, this work addresses live migration of a user’s operating system instance across wide area links. Live migration is done by performing most of the migration while the operating system is still running, achieving very little downtime and preserving all network connectivity. I developed an initial proof of concept of this solution. It relies on migrating whole operating systems using the Xen virtual machine and provides a way to perform live migration of persistent storage as well as the network connections across subnets. These challenges have not been addressed previously in this scenario. In a virtual machine environment, persistent storage is provided by virtual block devices. The architecture supports decentralized virtual block device replication across wide area network links, as well as migrating network connection across subnetworks using the Host Identity Protocol. The proposed architecture is compared against existing solutions and an initial performance evaluation of the prototype implementation is presented, showing that such a solution is a promising step towards true seamless mobility of fully fledged computing environments

    Energy-Aware Data Management on NUMA Architectures

    Get PDF
    The ever-increasing need for more computing and data processing power demands for a continuous and rapid growth of power-hungry data center capacities all over the world. As a first study in 2008 revealed, energy consumption of such data centers is becoming a critical problem, since their power consumption is about to double every 5 years. However, a recently (2016) released follow-up study points out that this threatening trend was dramatically throttled within the past years, due to the increased energy efficiency actions taken by data center operators. Furthermore, the authors of the study emphasize that making and keeping data centers energy-efficient is a continuous task, because more and more computing power is demanded from the same or an even lower energy budget, and that this threatening energy consumption trend will resume as soon as energy efficiency research efforts and its market adoption are reduced. An important class of applications running in data centers are data management systems, which are a fundamental component of nearly every application stack. While those systems were traditionally designed as disk-based databases that are optimized for keeping disk accesses as low a possible, modern state-of-the-art database systems are main memory-centric and store the entire data pool in the main memory, which replaces the disk as main bottleneck. To scale up such in-memory database systems, non-uniform memory access (NUMA) hardware architectures are employed that face a decreased bandwidth and an increased latency when accessing remote memory compared to the local memory. In this thesis, we investigate energy awareness aspects of large scale-up NUMA systems in the context of in-memory data management systems. To do so, we pick up the idea of a fine-grained data-oriented architecture and improve the concept in a way that it keeps pace with increased absolute performance numbers of a pure in-memory DBMS and scales up on NUMA systems in the large scale. To achieve this goal, we design and build ERIS, the first scale-up in-memory data management system that is designed from scratch to implement a data-oriented architecture. With the help of the ERIS platform, we explore our novel core concept for energy awareness, which is Energy Awareness by Adaptivity. The concept describes that software and especially database systems have to quickly respond to environmental changes (i.e., workload changes) by adapting themselves to enter a state of low energy consumption. We present the hierarchically organized Energy-Control Loop (ECL), which is a reactive control loop and provides two concrete implementations of our Energy Awareness by Adaptivity concept, namely the hardware-centric Resource Adaptivity and the software-centric Storage Adaptivity. Finally, we will give an exhaustive evaluation regarding the scalability of ERIS as well as our adaptivity facilities

    A bottom-up approach to real-time search in large networks and clouds

    Full text link
    corecore