160 research outputs found

    Mechanisms for Unbounded, Conflict-Robust Hardware Transactional Memory

    Get PDF
    Conventional lock implementations serialize access to critical sections guarded by the same lock, presenting programmers with a difficult tradeoff between granularity of synchronization and amount of parallelism realized. Recently, researchers have been investigating an emerging synchronization mechanism called transactional memory as an alternative to such conventional lock-based synchronization. Memory transactions have the semantics of executing in isolation from one another while in reality executing speculatively in parallel, aborting when necessary to maintain the appearance of isolation. This combination of coarse-grained isolation and optimistic parallelism has the potential to ease the tradeoff presented by lock-based programming. This dissertation studies the hardware implementation of transactional memory, making three main contributions. First, we propose the permissions-only cache, a mechanism that efficiently increases the size of transactions that can be handled in the local cache hierarchy to optimize performance. Second, we propose OneTM, an unbounded hardware transactional memory system that serializes transactions that escape the local cache hierarchy. Finally, we propose RetCon, a novel mechanism for detecting conflicts that reduces conflicts by allowing transactions to commit with different values than those with which they executed as long as dataflow and control-flow constraints are maintained

    The Junk Palace

    Get PDF
    This is an account of the introduction of an idea into a school curriculum, its problems, success and eventual development. The idea was to make a building, a process which might have the effect of integrating such things as art, music, drama, philosophy-the possibilities in such an open-ended situation being endless-a building constructed by pupils into and through which they could develop over the two summers of 1969/70.The school is a co-educational comprehensive unit of 1300 pupils. The school has no 6th Form, students continuing their education do so at the 6th Form College or College of Technology. It does, however, have a 'non-academic' 6th Form made up of students gaining further qualifications, the 6th Form college of the comprehensive system being selective at 4 '0' Levels; some of these students. took an active part in this work. The fully comprehensive stream had during 1969/70 risen to only the 3rd Form of the school, the 4th and 5th and 6th years were still classified as ";secondary modern";

    Token Tenure and PATCH: A Predictive/Adaptive Token-Counting Hybrid

    Get PDF
    Traditional coherence protocols present a set of difficult trade-offs: the reliance of snoopy protocols on broadcast and ordered interconnects limits their scalability, while directory protocols incur a performance penalty on sharing misses due to indirection. This work introduces PATCH (Predictive/Adaptive Token-Counting Hybrid), a coherence protocol that provides the scalability of directory protocols while opportunistically sending direct requests to reduce sharing latency. PATCH extends a standard directory protocol to track tokens and use token-counting rules for enforcing coherence permissions. Token counting allows PATCH to support direct requests on an unordered interconnect, while a mechanism called token tenure provides broadcast-free forward progress using the directory protocol’s per-block point of ordering at the home along with either timeouts at requesters or explicit race notification messages. PATCH makes three main contributions. First, PATCH introduces token tenure, which provides broadcast-free forward progress for token-counting protocols. Second, PATCH deprioritizes best-effort direct requests to match or exceed the performance of directory protocols without restricting scalability. Finally, PATCH provides greater scalability than directory protocols when using inexact encodings of sharers because only processors holding tokens need to acknowledge requests. Overall, PATCH is a “one-size-fits-all” coherence protocol that dynamically adapts to work well for small systems, large systems, and anywhere in between

    Adding Token Counting to Directory-Based Cache Coherence

    Get PDF
    The coherence protocol is a first-order design concern in multicore designs. Directory protocols are naturally scalable, as they place no restrictions on the interconnect and have minimal bandwidth requirements; however, this scalability comes at the cost of increased sharing latency due to indirection. In contrast, broadcast-based systems such as snooping protocols and token coherence reduce latency of sharing misses by sending requests directly to other processors. Unfortunately, their reliance on totally ordered interconnects and/or broadcast limits their scalability. This work introduces PATCH (Predictive/Adaptive Token Counting Hybrid), a coherence protocol that provides the scalability of directory protocols while opportunistically using available bandwidth to reduce sharing latency. PATCH extends a standard directory protocol to track tokens and use token counting rules for enforcing coherence permissions. Token counting allows PATCH to support direct requests on an unordered interconnect, while a novel mechanism called token tenure uses local processor timeouts and the directory’s per-block point of ordering at the home node to guarantee forward progress without relying on broadcast. PATCH makes three main contributions. First, PATCH uses direct request prioritization to match the performance of broadcast-based protocols without restricting scalability. Second, PATCH introduces token tenure, which provides broadcast-free forward progress for token counting protocols. Finally, PATCH provides greater scalability than directory protocols when using inexact encodings of sharers because only processors holding tokens need to acknowledge requests. Overall, PATCH is a “one-size-fits-all” coherence protocol that dynamically adapts to work well for small systems, large systems, and anywhere in betwee

    Unrestricted Transactional Memory: Supporting I/O and System Calls Within Transactions

    Get PDF
    Hardware transactional memory has great potential to simplify the creation of correct and efficient multithreaded programs, enabling programmers to exploit the soon-to-be-ubiquitous multi-core designs. Transactions are simply segments of code that are guaranteed to execute without interference from other concurrently-executing threads. The hardware executes transactions in parallel, ensuring non-interference via abort/rollback/restart when conflicts are detected. Transactions thus provide both a simple programming interface and a highly-concurrent implementation that serializes only on data conflicts. A progression of recent work has broadened the utility of transactional memory by lifting the bound on the size and duration of transactions, called unbounded transactions. Nevertheless, two key challenges remain: (i) I/O and system calls cannot appear in transactions and (ii) existing unbounded transactional memory proposals require complex implementations. We describe a system for fully unrestricted transactions (i.e., they can contain I/O and system calls in addition to being unbounded in size and duration). We achieve this via two modes of transaction execution: restricted (which limits transaction size, duration, and content but is highly concurrent) and unrestricted (which is unbounded and can contain I/O and system calls but has limited concurrency because there can be only one unrestricted transaction executing at a time). Transactions transition to unrestricted mode only when necessary. We introduce unoptimized and optimized implementations in order to balance performance and design complexity

    Subtleties of Transactional Memory Atomicity Semantics

    Get PDF
    Transactional memory has great potential for simplifying multithreaded programming by allowing programmers to specify regions of the program that must appear to execute atomically. Transactional memory implementations then optimistically execute these transactions concurrently to obtain high performance. This work shows that the same atomic guarantees that give transactions their power also have unexpected and potentially serious negative effects on programs that were written assuming narrower scopes of atomicity. We make four contributions: (1) we show that a direct translation of lock-based critical sections into transactions can introduce deadlock into otherwise correct programs, (2) we introduce the terms strong atomicity and weak atomicity to describe the interaction of transactional and non-transactional code, (3) we show that code that is correct under weak atomicity can deadlock under strong atomicity, and (4) we demonstrate that sequentially composing transactional code can also introduce deadlocks. These observations invalidate the intuition that transactions are strictly safer than lock-based critical sections, that strong atomicity is strictly safer than weak atomicity, and that transactions are always composable

    Improved Sequence-Based Speculation Techniques for Implementing Memory Consistency

    Get PDF
    This work presents BMW, a new design for speculative implementations of memory consistency models in shared-memory multiprocessors. BMW obtains the same performance as prior proposals, but achieves this performance while avoiding several undesirable attributes of prior proposals: non-scalable structures, per-word valid bits in the data cache, modifications to the cache coherence protocol, and global arbitration. BMW uses a read and write bit per cache block and a standard invalidation-based cache coherence protocol to perform conflict detection while speculating. While speculating, stores to block not in the cache are placed into a coalescing store buffer until those misses return. Stores are written speculatively to the primary cache, and non-speculative state is maintained by cleaning dirty blocks before being written speculatively. Speculative blocks are invalidated on abort and marked as non-speculative on commit. This organization allows for fast, local commits while avoiding a non-scalable store queue

    Assume-Guarantee Testing

    Get PDF
    Verification techniques for component-based systems should ideally be able to predict properties of the assembled system through analysis of individual components before assembly. This work introduces such a modular technique in the context of testing. Assume-guarantee testing relies on the (automated) decomposition of key system-level requirements into local component requirements at design time. Developers can verify the local requirements by checking components in isolation; failed checks may indicate violations of system requirements, while valid traces from different components compose via the assume-guarantee proof rule to potentially provide system coverage. These local requirements also form the foundation of a technique for efficient predictive testing of assembled systems: given a correct system run, this technique can predict violations by alternative system runs without constructing those runs. We discuss the application of our approach to testing a multi-threaded NASA application, where we treat threads as components

    HardBound: Architectural Support for Spatial Safety of the C Programming Language

    Get PDF
    The C programming language is at least as well known for its absence of spatial memory safety guarantees (i.e., lack of bounds checking) as it is for its high performance. C\u27s unchecked pointer arithmetic and array indexing allow simple programming mistakes to lead to erroneous executions, silent data corruption, and security vulnerabilities. Many prior proposals have tackled enforcing spatial safety in C programs by checking pointer and array accesses. However, existing software-only proposals have significant drawbacks that may prevent wide adoption, including: unacceptably high runtime overheads, lack of completeness, incompatible pointer representations, or need for non-trivial changes to existing C source code and compiler infrastructure. Inspired by the promise of these software-only approaches, this paper proposes a hardware bounded pointer architectural primitive that supports cooperative hardware/software enforcement of spatial memory safety for C programs. This bounded pointer is a new hardware primitive datatype for pointers that leaves the standard C pointer representation intact, but augments it with bounds information maintained separately and invisibly by the hardware. The bounds are initialized by the software, and they are then propagated and enforced transparently by the hardware, which automatically checks a pointer\u27s bounds before it is dereferenced. One mode of use requires instrumenting only malloc, which enables enforcement of per-allocation spatial safety for heap-allocated objects for existing binaries. When combined with simple intra-procedural compiler instrumentation, hardware bounded pointers enable a low-overhead approach for enforcing complete spatial memory safety in unmodified C programs
    • …
    corecore