1,575 research outputs found

    Improving the Performance and Endurance of Persistent Memory with Loose-Ordering Consistency

    Full text link
    Persistent memory provides high-performance data persistence at main memory. Memory writes need to be performed in strict order to satisfy storage consistency requirements and enable correct recovery from system crashes. Unfortunately, adhering to such a strict order significantly degrades system performance and persistent memory endurance. This paper introduces a new mechanism, Loose-Ordering Consistency (LOC), that satisfies the ordering requirements at significantly lower performance and endurance loss. LOC consists of two key techniques. First, Eager Commit eliminates the need to perform a persistent commit record write within a transaction. We do so by ensuring that we can determine the status of all committed transactions during recovery by storing necessary metadata information statically with blocks of data written to memory. Second, Speculative Persistence relaxes the write ordering between transactions by allowing writes to be speculatively written to persistent memory. A speculative write is made visible to software only after its associated transaction commits. To enable this, our mechanism supports the tracking of committed transaction ID and multi-versioning in the CPU cache. Our evaluations show that LOC reduces the average performance overhead of memory persistence from 66.9% to 34.9% and the memory write traffic overhead from 17.1% to 3.4% on a variety of workloads.Comment: This paper has been accepted by IEEE Transactions on Parallel and Distributed System

    The Potential of Synergistic Static, Dynamic and Speculative Loop Nest Optimizations for Automatic Parallelization

    Get PDF
    Research in automatic parallelization of loop-centric programs started with static analysis, then broadened its arsenal to include dynamic inspection-execution and speculative execution, the best results involving hybrid static-dynamic schemes. Beyond the detection of parallelism in a sequential program, scalable parallelization on many-core processors involves hard and interesting parallelism adaptation and mapping challenges. These challenges include tailoring data locality to the memory hierarchy, structuring independent tasks hierarchically to exploit multiple levels of parallelism, tuning the synchronization grain, balancing the execution load, decoupling the execution into thread-level pipelines, and leveraging heterogeneous hardware with specialized accelerators. The polyhedral framework allows to model, construct and apply very complex loop nest transformations addressing most of the parallelism adaptation and mapping challenges. But apart from hardware-specific, back-end oriented transformations (if-conversion, trace scheduling, value prediction), loop nest optimization has essentially ignored dynamic and speculative techniques. Research in polyhedral compilation recently reached a significant milestone towards the support of dynamic, data-dependent control flow. This opens a large avenue for blending dynamic analyses and speculative techniques with advanced loop nest optimizations. Selecting real-world examples from SPEC benchmarks and numerical kernels, we make a case for the design of synergistic static, dynamic and speculative loop transformation techniques. We also sketch the embedding of dynamic information, including speculative assumptions, in the heart of affine transformation search spaces

    08241 Abstracts Collection -- Transactional Memory : From Implementation to Application

    Get PDF
    From 08.06. to 13.06.2008, the Dagstuhl Seminar 08241 ``Transactional Memory: From Implementation to Application\u27\u27 was held in Schloss Dagstuhl -- Leibniz Center for Informatics. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

    New hardware support transactional memory and parallel debugging in multicore processors

    Get PDF
    This thesis contributes to the area of hardware support for parallel programming by introducing new hardware elements in multicore processors, with the aim of improving the performance and optimize new tools, abstractions and applications related with parallel programming, such as transactional memory and data race detectors. Specifically, we configure a hardware transactional memory system with signatures as part of the hardware support, and we develop a new hardware filter for reducing the signature size. We also develop the first hardware asymmetric data race detector (which is also able to tolerate them), based also in hardware signatures. Finally, we propose a new module of hardware signatures that solves some of the problems that we found in the previous tools related with the lack of flexibility in hardware signatures

    A selective logging mechanism for hardware transactional memory systems

    Get PDF
    Log-based Hardware Transactional Memory (HTM) systems offer an elegant solution to handle speculative data that overflow transactional L1 caches. By keeping the pre-transactional values on a software-resident log, speculative values can be safely moved across the memory hierarchy, without requiring expensive searches on L1 misses or commits.Postprint (author’s final draft

    Value creation through HR shared services: towards a conceptual framework

    Get PDF
    Purpose – The purpose of this paper is to derive a measure for the performance of human resource shared service providers (HR SSPs) and then to develop a theoretical framework that conceptualises their performance.\ud \ud Design/methodology/approach – This conceptual paper starts from the HR shared services argument and integrates this with the knowledge-based view of the firm and the concept of intellectual capital.\ud \ud Findings – We recommend measuring HR SSP performance as HR value, referring to the ratio between use value and exchange value, that together reflect both transactional and transformational HR value. We argue that transactional HR value directly flows from the organisational capital in HR SSPs, whereas human and social capitals enable them to leverage their organisational capital for HR value creation. We argue that the human capital of HR SSPs has a direct effect on transformational HR value creation, while their social and organisational capitals positively moderate this relationship.\ud \ud Originality/value – The suggested measure paves the way for operationalising and measuring the performance of HR shared services providers. This paper offers testable propositions for the relationships between intellectual capital and the performance of HR shared service providers. These contributions could assist future research to move beyond the descriptive nature that characterises the existing literature

    Configurable Version Management Hardware Transactional Memory for Multi-processor Platform

    Get PDF
    Programming on a shared memory multi-processor platforms in an efficient way is difficult as locked based synchronization limits the efficiency. Transactional memory (TM) is a promising approach in creating an abstraction layer for multi-threaded programming. However, the performance of TM is application-specific. In general, the configuration of a TM is divided into version management and conflict management. Each scheme has its strengths and weaknesses depending on executing application. Previous TM implementations for embedded system were built on fixed version management configuration which results in significant performance loss when transaction behaviour changes. In this paper, we propose a hardware transactional memory (HTM) with interchangeable version management. Random requests at different contention levels are used to verify the performance of the proposed TM. The proposed architecture is targeted for embedded applications and is area-efficient compared to current implementations that apply cache coherence protocols
    • …
    corecore