7 research outputs found

    Mechanisms for Unbounded, Conflict-Robust Hardware Transactional Memory

    Get PDF
    Conventional lock implementations serialize access to critical sections guarded by the same lock, presenting programmers with a difficult tradeoff between granularity of synchronization and amount of parallelism realized. Recently, researchers have been investigating an emerging synchronization mechanism called transactional memory as an alternative to such conventional lock-based synchronization. Memory transactions have the semantics of executing in isolation from one another while in reality executing speculatively in parallel, aborting when necessary to maintain the appearance of isolation. This combination of coarse-grained isolation and optimistic parallelism has the potential to ease the tradeoff presented by lock-based programming. This dissertation studies the hardware implementation of transactional memory, making three main contributions. First, we propose the permissions-only cache, a mechanism that efficiently increases the size of transactions that can be handled in the local cache hierarchy to optimize performance. Second, we propose OneTM, an unbounded hardware transactional memory system that serializes transactions that escape the local cache hierarchy. Finally, we propose RetCon, a novel mechanism for detecting conflicts that reduces conflicts by allowing transactions to commit with different values than those with which they executed as long as dataflow and control-flow constraints are maintained

    Scalable selective re-execution for edge architectures

    No full text
    Pipeline flushes are becoming increasingly expensive in modern microprocessors with large instruction windows and deep pipelines. Selective re-execution is a technique that can reduce the penalty of mis-speculations by re-executing only instructions a#ected by the mis-speculation, instead of all instructions. In this paper we introduce a new selective re-execution mechanism that exploits the properties of a dataflow-like Explicit Data Graph Execution (EDGE) architecture to support e#cient mis-speculation recovery, while scaling to window sizes of thousands of instructions with high performance. This distributed selective re-execution (DSRE) protocol permits multiple speculative waves of computation to be traversing a dataflow graph simultaneously, with a commit wave propagating behind them to ensure correct execution. We evaluate one application of this protocol to provide e#cient recovery for load-store dependence speculation. Unlike traditional dataflow architectures which resorted to single-assignment memory semantics, the DSRE protocol combines dataflow execution with speculation to enable high performance and conventional sequential memory semantics. Our experiments show that the DSRE protocol results in an average 17% speedup over the best dependence predictor proposed to date, and obtains 82% of the performance possible with a perfect oracle directing the issue of loads

    Heterogeneous processor composition: metrics and methods

    Get PDF
    Heterogeneous processors intended for mobile devices are composed of a number of different CPU cores that enable the processor to optimize performance under strict power limits that vary over time. Design space exploration techniques can be used to discover a candidate set of potential cores that could be implemented on a heterogeneous processor. However, candidate sets contain far more cores than can feasibly be implemented. Heterogeneous processor composition therefore requires solutions to the selection problem and the evaluation problem. Cores must be selected from the candidate set, and these cores must be shown to be quantitatively superior to alternative selections. The qualitative criterion for a selection of cores is diversity. A diverse set of heterogeneous cores allows a processor to execute tasks with varying dynamic behaviors at a range of power and performance levels that are appropriate for conditions during runtime. This thesis presents a detailed description of the selection and evaluation problems, and establishes a theoretical framework for reasoning about the runtime behavior of power-limited, heterogeneous processors. The evaluation problem is specifically concerned with evaluating the collective attributes of selections of cores rather than evaluating the features of individual cores. A suite of metrics is defined to address the evaluation problem. The metrics quantify considerations that could otherwise only be evaluated subjectively. The selection problem is addressed with an iterative, diversity-preserving algorithm that emphasizes the flexibility available to programs at runtime. The algorithm includes facilities for guiding the selection process with information from an expert, when available. Three variations on the selection algorithm are defined. A thorough analysis of the proposed selection algorithm is presented using data from a large-scale simulation involving 33 benchmarks and 3000 core types. The three variations of the algorithm are compared to each other and to current, state-of-the-art selection techniques. The analysis serves as both an evaluation of the proposed algorithm as well as a case study of the metrics
    corecore