597 research outputs found

    Scalability Analysis of Signatures in Transactional Memory Systems

    Get PDF
    Signatures have been proposed in transactional memory systems to represent read and write sets and to decouple transaction conflict detection from private caches or to accelerate it. Generally, signatures are implemented as Bloom filters that allow unbounded read/write sets to be summarized in bounded space at the cost of false conflict detection. It is known that this behavior has great impact in parallel performance. In this work, a scalability study of state-of-the-art signature designs is presented, for different orthogonal transactional characteristics, including contention, length, concurrency and spatial locality. This study was accomplished using the Stanford EigenBench benchmark. This benchmark was modified to support spatial locality analysis using a Zipf address distribution. Experimental evaluation on a hardware transactional memory simulator shows the impact of those parameters in the behavior of state-of-the-art signatures.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Towards Data Optimization in Storages and Networks

    Get PDF
    Title from PDF of title page, viewed on August 7, 2015Dissertation advisors: Sejun Song and Baek-Young ChoiVitaIncludes bibliographic references (pages 132-140)Thesis (Ph.D.)--School of Computing and Engineering. University of Missouri--Kansas City, 2015We are encountering an explosion of data volume, as a study estimates that data will amount to 40 zeta bytes by the end of 2020. This data explosion poses significant burden not only on data storage space but also access latency, manageability, and processing and network bandwidth. However, large portions of the huge data volume contain massive redundancies that are created by users, applications, systems, and communication models. Deduplication is a technique to reduce data volume by removing redundancies. Reliability will be even improved when data is replicated after deduplication. Many deduplication studies such as storage data deduplication and network redundancy elimination have been proposed to reduce storage consumption and network bandwidth consumption. However, existing solutions are not efficient enough to optimize data delivery path from clients to servers through network. Hence we propose a holistic deduplication framework to optimize data in their path. Our deduplication framework consists of three components including data sources or clients, networks, and servers. The client component removes local redundancies in clients, the network component removes redundant transfers coming from different clients, and the server component removes redundancies coming from different networks. We designed and developed components for the proposed deduplication framework. For the server component, we developed the Hybrid Email Deduplication System that achieves a trade-off of space savings and overhead for email systems. For the client component, we developed the Structure Aware File and Email Deduplication for Cloudbased Storage Systems that is very fast as well as having good space savings by using structure-based granularity. For the network component, we developed a system called Software-defined Deduplication as a Network and Storage service that is in-network deduplication, and that chains storage data deduplication and network redundancy elimination functions by using Software Defined Network to achieve both storage space and network bandwidth savings with low processing time and memory size. We also discuss mobile deduplication for image and video files in mobile devices. Through system implementations and experiments, we show that the proposed framework effectively and efficiently optimizes data volume in a holistic manner encompassing the entire data path of clients, networks and storage servers.Introduction -- Deduplication technology -- Existing deduplication approaches -- HEDS: Hybrid Email Deduplication System -- SAFE: Structure-aware File and Email Deduplication for cloud-based storage systems -- SoftDance: Software-defined Deduplication as a Network and Storage Service -- Moblie de-duplication -- Conclusion

    khmer: Working with Big Data in Bioinformatics

    Full text link
    We introduce design and optimization considerations for the 'khmer' package.Comment: Invited chapter for forthcoming book on Performance of Open Source Application

    Memory Subsystem Optimization Techniques for Modern High-Performance General-Purpose Processors

    Get PDF
    abstract: General-purpose processors propel the advances and innovations that are the subject of humanity’s many endeavors. Catering to this demand, chip-multiprocessors (CMPs) and general-purpose graphics processing units (GPGPUs) have seen many high-performance innovations in their architectures. With these advances, the memory subsystem has become the performance- and energy-limiting aspect of CMPs and GPGPUs alike. This dissertation identifies and mitigates the key performance and energy-efficiency bottlenecks in the memory subsystem of general-purpose processors via novel, practical, microarchitecture and system-architecture solutions. Addressing the important Last Level Cache (LLC) management problem in CMPs, I observe that LLC management decisions made in isolation, as in prior proposals, often lead to sub-optimal system performance. I demonstrate that in order to maximize system performance, it is essential to manage the LLCs while being cognizant of its interaction with the system main memory. I propose ReMAP, which reduces the net memory access cost by evicting cache lines that either have no reuse, or have low memory access cost. ReMAP improves the performance of the CMP system by as much as 13%, and by an average of 6.5%. Rather than the LLC, the L1 data cache has a pronounced impact on GPGPU performance by acting as the bandwidth filter for the rest of the memory subsystem. Prior work has shown that the severely constrained data cache capacity in GPGPUs leads to sub-optimal performance. In this thesis, I propose two novel techniques that address the GPGPU data cache capacity problem. I propose ID-Cache that performs effective cache bypassing and cache line size selection to improve cache capacity utilization. Next, I propose LATTE-CC that considers the GPU’s latency tolerance feature and adaptively compresses the data stored in the data cache, thereby increasing its effective capacity. ID-Cache and LATTE-CC are shown to achieve 71% and 19.2% speedup, respectively, over a wide variety of GPGPU applications. Complementing the aforementioned microarchitecture techniques, I identify the need for system architecture innovations to sustain performance scalability of GPG- PUs in the face of slowing Moore’s Law. I propose a novel GPU architecture called the Multi-Chip-Module GPU (MCM-GPU) that integrates multiple GPU modules to form a single logical GPU. With intelligent memory subsystem optimizations tailored for MCM-GPUs, it can achieve within 7% of the performance of a similar but hypothetical monolithic die GPU. Taking a step further, I present an in-depth study of the energy-efficiency characteristics of future MCM-GPUs. I demonstrate that the inherent non-uniform memory access side-effects form the key energy-efficiency bottleneck in the future. In summary, this thesis offers key insights into the performance and energy-efficiency bottlenecks in CMPs and GPGPUs, which can guide future architects towards developing high-performance and energy-efficient general-purpose processors.Dissertation/ThesisDoctoral Dissertation Computer Science 201
    • …
    corecore