135 research outputs found

    GPUs as Storage System Accelerators

    Full text link
    Massively multicore processors, such as Graphics Processing Units (GPUs), provide, at a comparable price, a one order of magnitude higher peak performance than traditional CPUs. This drop in the cost of computation, as any order-of-magnitude drop in the cost per unit of performance for a class of system components, triggers the opportunity to redesign systems and to explore new ways to engineer them to recalibrate the cost-to-performance relation. This project explores the feasibility of harnessing GPUs' computational power to improve the performance, reliability, or security of distributed storage systems. In this context, we present the design of a storage system prototype that uses GPU offloading to accelerate a number of computationally intensive primitives based on hashing, and introduce techniques to efficiently leverage the processing power of GPUs. We evaluate the performance of this prototype under two configurations: as a content addressable storage system that facilitates online similarity detection between successive versions of the same file and as a traditional system that uses hashing to preserve data integrity. Further, we evaluate the impact of offloading to the GPU on competing applications' performance. Our results show that this technique can bring tangible performance gains without negatively impacting the performance of concurrently running applications.Comment: IEEE Transactions on Parallel and Distributed Systems, 201

    THE USE OF ROUGH CLASSIFICATION AND TWO THRESHOLD TWO DIVISORS FOR DEDUPLICATION

    Get PDF
    The data deduplication technique efficiently reduces and removes redundant data in big data storage systems. The main issue is that the data deduplication requires expensive computational effort to remove duplicate data due to the vast size of big data. The paper attempts to reduce the time and computation required for data deduplication stages. The chunking and hashing stage often requires a lot of calculations and time. This paper initially proposes an efficient new method to exploit the parallel processing of deduplication systems with the best performance. The proposed system is designed to use multicore computing efficiently. First, The proposed method removes redundant data by making a rough classification for the input into several classes using the histogram similarity and k-mean algorithm. Next, a new method for calculating the divisor list for each class was introduced to improve the chunking method and increase the data deduplication ratio. Finally, the performance of the proposed method was evaluated using three datasets as test examples. The proposed method proves that data deduplication based on classes and a multicore processor is much faster than a single-core processor. Moreover, the experimental results showed that the proposed method significantly improved the performance of Two Threshold Two Divisors (TTTD) and Basic Sliding Window BSW algorithms

    Coordinate Memory Deduplication and Partition for Improving Performance in Cloud Computing

    Full text link
    [EN] Both limited main memory size and memory interference are considered as the major bottlenecks in virtualization environments. Memory deduplication, detecting pages with same content and being shared into one single copy, reduces memory requirements; memory partition, allocating unique colors for each virtual machine according to page color, reduces memory interference among virtual machines to improve performance. In this paper, we propose a coordinate memory deduplication and partition approach named CMDP to reduce memory requirement and interference simultaneously for improving performance in virtualization. Moreover, CMDP adopts a lightweight page behavior-based memory deduplication approach named BMD to reduce futile page comparison overhead meanwhile to detect page sharing opportunities efficiently. And a virtual machine based memory partition called VMMP is added into CMDP to reduce interference among virtual machines. According to page color, VMMP allocates unique page colors to applications, virtual machines and hypervisor. The experimental results show that CMDP can efficiently improve performance (by about 15.8 percent) meanwhile accommodate more virtual machines concurrently.This work was supported by "Qing Lan Project", "the National Natural Science Foundation of China under Grants 61572172, 61401147, and 61572164", " the Natural Science Foundation of Jiangsu Province of China, Nos. BK20131137 and BK20140248", "Zhejiang provincial Natural Science Foundation Nos. LQ14F020011 and LQ12F02003", by Instituto de Telecomunicacoes, Next Generation Networks and Applications Group (NetGNA), Covilha Delegation, Portugal and by National Funding from the FCT Fundacao para a Ciencia e a Tecnologia through the UID/EEA/500008/2013 Project. Guangjie Han is the corresponding author.Jia, G.; Han, G.; Rodrigues, JJPC.; Lloret, J.; Li, W. (2019). Coordinate Memory Deduplication and Partition for Improving Performance in Cloud Computing. IEEE Transactions on Cloud Computing. 7(2):357-368. https://doi.org/10.1109/TCC.2015.25117383573687

    Shredder: GPU-Accelerated Incremental Storage and Computation

    Get PDF
    Redundancy elimination using data deduplication and incremental data processing has emerged as an important technique to minimize storage and computation requirements in data center computing. In this paper, we present the design, implementation and evaluation of Shredder, a high performance content-based chunking framework for supporting incremental storage and computation systems. Shredder exploits the massively parallel processing power of GPUs to overcome the CPU bottlenecks of content-based chunking in a cost-effective manner. Unlike previous uses of GPUs, which have focused on applications where computation costs are dominant, Shredder is designed to operate in both compute-and dataintensive environments. To allow this, Shredder provides several novel optimizations aimed at reducing the cost of transferring data between host (CPU) and GPU, fully utilizing the multicore architecture at the host, and reducing GPU memory access latencies. With our optimizations, Shredder achieves a speedup of over 5X for chunking bandwidth compared to our optimized parallel implementation without a GPU on the same host system. Furthermore, we present two real world applications of Shredder: an extension to HDFS, which serves as a basis for incremental MapReduce computations, and an incremental cloud backup system. In both contexts, Shredder detects redundancies in the input data across successive runs, leading to significant savings in storage, computation, and end-to-end completion times.

    Investigating tools and techniques for improving software performance on multiprocessor computer systems

    Get PDF
    The availability of modern commodity multicore processors and multiprocessor computer systems has resulted in the widespread adoption of parallel computers in a variety of environments, ranging from the home to workstation and server environments in particular. Unfortunately, parallel programming is harder and requires more expertise than the traditional sequential programming model. The variety of tools and parallel programming models available to the programmer further complicates the issue. The primary goal of this research was to identify and describe a selection of parallel programming tools and techniques to aid novice parallel programmers in the process of developing efficient parallel C/C++ programs for the Linux platform. This was achieved by highlighting and describing the key concepts and hardware factors that affect parallel programming, providing a brief survey of commonly available software development tools and parallel programming models and libraries, and presenting structured approaches to software performance tuning and parallel programming. Finally, the performance of several parallel programming models and libraries was investigated, along with the programming effort required to implement solutions using the respective models. A quantitative research methodology was applied to the investigation of the performance and programming effort associated with the selected parallel programming models and libraries, which included automatic parallelisation by the compiler, Boost Threads, Cilk Plus, OpenMP, POSIX threads (Pthreads), and Threading Building Blocks (TBB). Additionally, the performance of the GNU C/C++ and Intel C/C++ compilers was examined. The results revealed that the choice of parallel programming model or library is dependent on the type of problem being solved and that there is no overall best choice for all classes of problem. However, the results also indicate that parallel programming models with higher levels of abstraction require less programming effort and provide similar performance compared to explicit threading models. The principle conclusion was that the problem analysis and parallel design are an important factor in the selection of the parallel programming model and tools, but that models with higher levels of abstractions, such as OpenMP and Threading Building Blocks, are favoured
    • …
    corecore