160 research outputs found
Mechanisms for Unbounded, Conflict-Robust Hardware Transactional Memory
Conventional lock implementations serialize access to critical sections guarded by the same lock, presenting programmers with a difficult tradeoff between granularity of synchronization and amount of parallelism realized. Recently, researchers have been investigating an emerging synchronization mechanism called transactional memory as an alternative to such conventional lock-based synchronization. Memory transactions have the semantics of executing in isolation from one another while in reality executing speculatively in parallel, aborting when necessary to maintain the appearance of isolation. This combination of coarse-grained isolation and optimistic parallelism has the potential to ease the tradeoff presented by lock-based programming.
This dissertation studies the hardware implementation of transactional memory, making three main contributions. First, we propose the permissions-only cache, a mechanism that efficiently increases the size of transactions that can be handled in the local cache hierarchy to optimize performance. Second, we propose OneTM, an unbounded hardware transactional memory system that serializes transactions that escape the local cache hierarchy. Finally, we propose RetCon, a novel mechanism for detecting conflicts that reduces conflicts by allowing transactions to commit with different values than those with which they executed as long as dataflow and control-flow constraints are maintained
The Junk Palace
This is an account of the introduction of an idea into a school curriculum, its problems, success and eventual development. The idea was to make a building, a process which might have the effect of integrating such things as art, music, drama, philosophy-the possibilities in such an open-ended situation being endless-a building constructed by pupils into and through which they could develop over the two summers of 1969/70.The school is a co-educational comprehensive unit of 1300 pupils. The school has no 6th Form, students continuing their education do so at the 6th Form College or College of Technology. It does, however, have a 'non-academic' 6th Form made up of students gaining further qualifications, the 6th Form college of the comprehensive system being selective at 4 '0' Levels; some of these students. took an active part in this work. The fully comprehensive stream had during 1969/70 risen to only the 3rd Form of the school, the 4th and 5th and 6th years were still classified as ";secondary modern";
Recommended from our members
Blended Tutorials: Blended Synchronous Learning in Mathematics
A blended tutorial is a single learning event which gives students the opportunity of attending face-to-face or online. This article reports the findings of a scholarship of teaching and learning project conducted at The Open University, and considers the barriers and opportunities to using blended tutorials to support distance learning. Two pilot blended tutorials were carried out on the honours mathematics module M337 Complex Analysis, and the results of the evaluation are presented. Using qualitative data from practitioner reflections, lesson observations and semi-structured student interviews, this project uses thematic analysis to identify barriers and opportunities to using blended tutorials. Particular emphasis is given to the unique challenges in learning mathematics, and in the distance learning context of The Open University. The report concludes with recommendations for the design of blended tutorials, and recommendations for future research.
Token Tenure and PATCH: A Predictive/Adaptive Token-Counting Hybrid
Traditional coherence protocols present a set of difficult trade-offs: the reliance of snoopy protocols on broadcast and ordered interconnects limits their scalability, while directory protocols incur a performance penalty on sharing misses due to indirection. This work introduces PATCH (Predictive/Adaptive Token-Counting Hybrid), a coherence protocol that provides the scalability of directory protocols while opportunistically sending direct requests to reduce sharing latency. PATCH extends a standard directory protocol to track tokens and use token-counting rules for enforcing coherence permissions. Token counting allows PATCH to support direct requests on an unordered interconnect, while a mechanism called token tenure provides broadcast-free forward progress using the directory protocol’s per-block point of ordering at the home along with either timeouts at requesters or explicit race notification messages. PATCH makes three main contributions. First, PATCH introduces token tenure, which provides broadcast-free forward progress for token-counting protocols. Second, PATCH deprioritizes best-effort direct requests to match or exceed the performance of directory protocols without restricting scalability. Finally, PATCH provides greater scalability than directory protocols when using inexact encodings of sharers because only processors holding tokens need to acknowledge requests. Overall, PATCH is a “one-size-fits-all” coherence protocol that dynamically adapts to work well for small systems, large systems, and anywhere in between
Adding Token Counting to Directory-Based Cache Coherence
The coherence protocol is a first-order design concern in multicore designs. Directory protocols are naturally scalable, as they place no restrictions on the interconnect and have minimal bandwidth requirements; however, this scalability comes at the cost of increased sharing latency due to indirection. In contrast, broadcast-based systems such as snooping protocols and token coherence reduce latency of sharing misses by sending requests directly to other processors. Unfortunately, their reliance on totally ordered interconnects and/or broadcast limits their scalability.
This work introduces PATCH (Predictive/Adaptive Token Counting Hybrid), a coherence protocol that provides the scalability of directory protocols while opportunistically using available bandwidth to reduce sharing latency. PATCH extends a standard directory protocol to track tokens and use token counting rules for enforcing coherence permissions. Token counting allows PATCH to support direct requests on an unordered interconnect, while a novel mechanism called token tenure uses local processor timeouts and the directory’s per-block point of ordering at the home node to guarantee forward progress without relying on broadcast.
PATCH makes three main contributions. First, PATCH uses direct request prioritization to match the performance of broadcast-based protocols without restricting scalability. Second, PATCH introduces token tenure, which provides broadcast-free forward progress for token counting protocols. Finally, PATCH provides greater scalability than directory protocols when using inexact encodings of sharers because only processors holding tokens need to acknowledge requests. Overall, PATCH is a “one-size-fits-all” coherence protocol that dynamically adapts to work well for small systems, large systems, and anywhere in betwee
Unrestricted Transactional Memory: Supporting I/O and System Calls Within Transactions
Hardware transactional memory has great potential to simplify the creation of correct and efficient multithreaded programs, enabling programmers to exploit the soon-to-be-ubiquitous multi-core designs. Transactions are simply segments of code that are guaranteed to execute without interference from other concurrently-executing threads. The hardware executes transactions in parallel, ensuring non-interference via abort/rollback/restart when conflicts are detected. Transactions thus provide both a simple programming interface and a highly-concurrent implementation that serializes only on data conflicts. A progression of recent work has broadened the utility of transactional memory by lifting the bound on the size and duration of transactions, called unbounded transactions. Nevertheless, two key challenges remain: (i) I/O and system calls cannot appear in transactions and (ii) existing unbounded transactional memory proposals require complex implementations.
We describe a system for fully unrestricted transactions (i.e., they can contain I/O and system calls in addition to being unbounded in size and duration). We achieve this via two modes of transaction execution: restricted (which limits transaction size, duration, and content but is highly concurrent) and unrestricted (which is unbounded and can contain I/O and system calls but has limited concurrency because there can be only one unrestricted transaction executing at a time). Transactions transition to unrestricted mode only when necessary. We introduce unoptimized and optimized implementations in order to balance performance and design complexity
Subtleties of Transactional Memory Atomicity Semantics
Transactional memory has great potential for simplifying multithreaded programming by allowing programmers to specify regions of the program that must appear to execute atomically. Transactional memory implementations then optimistically execute these transactions concurrently to obtain high performance. This work shows that the same atomic guarantees that give transactions their power also have unexpected and potentially serious negative effects on programs that were written assuming narrower scopes of atomicity. We make four contributions: (1) we show that a direct translation of lock-based critical sections into transactions can introduce deadlock into otherwise correct programs, (2) we introduce the terms strong atomicity and weak atomicity to describe the interaction of transactional and non-transactional code, (3) we show that code that is correct under weak atomicity can deadlock under strong atomicity, and (4) we demonstrate that sequentially composing transactional code can also introduce deadlocks. These observations invalidate the intuition that transactions are strictly safer than lock-based critical sections, that strong atomicity is strictly safer than weak atomicity, and that transactions are always composable
Improved Sequence-Based Speculation Techniques for Implementing Memory Consistency
This work presents BMW, a new design for speculative implementations of memory consistency models in shared-memory multiprocessors. BMW obtains the same performance as prior proposals, but achieves this performance while avoiding several undesirable attributes of prior proposals: non-scalable structures, per-word valid bits in the data cache, modifications to the cache coherence protocol, and global arbitration.
BMW uses a read and write bit per cache block and a standard invalidation-based cache coherence protocol to perform conflict detection while speculating. While speculating, stores to block not in the cache are placed into a coalescing store buffer until those misses return. Stores are written speculatively to the primary cache, and non-speculative state is maintained by cleaning dirty blocks before being written speculatively. Speculative blocks are invalidated on abort and marked as non-speculative on commit. This organization allows for fast, local commits while avoiding a non-scalable store queue
Assume-Guarantee Testing
Verification techniques for component-based systems should ideally be able to predict properties of the assembled system through analysis of individual components before assembly. This work introduces such a modular technique in the context of testing. Assume-guarantee testing relies on the (automated) decomposition of key system-level requirements into local component requirements at design time. Developers can verify the local requirements by checking components in isolation; failed checks may indicate violations of system requirements, while valid traces from different components compose via the assume-guarantee proof rule to potentially provide system coverage. These local requirements also form the foundation of a technique for efficient predictive testing of assembled systems: given a correct system run, this technique can predict violations by alternative system runs without constructing those runs. We discuss the application of our approach to testing a multi-threaded NASA application, where we treat threads as components
HardBound: Architectural Support for Spatial Safety of the C Programming Language
The C programming language is at least as well known for its absence of spatial memory safety guarantees (i.e., lack of bounds checking) as it is for its high performance. C\u27s unchecked pointer arithmetic and array indexing allow simple programming mistakes to lead to erroneous executions, silent data corruption, and security vulnerabilities. Many prior proposals have tackled enforcing spatial safety in C programs by checking pointer and array accesses. However, existing software-only proposals have significant drawbacks that may prevent wide adoption, including: unacceptably high runtime overheads, lack of completeness, incompatible pointer representations, or need for non-trivial changes to existing C source code and compiler infrastructure.
Inspired by the promise of these software-only approaches, this paper proposes a hardware bounded pointer architectural primitive that supports cooperative hardware/software enforcement of spatial memory safety for C programs. This bounded pointer is a new hardware primitive datatype for pointers that leaves the standard C pointer representation intact, but augments it with bounds information maintained separately and invisibly by the hardware. The bounds are initialized by the software, and they are then propagated and enforced transparently by the hardware, which automatically checks a pointer\u27s bounds before it is dereferenced. One mode of use requires instrumenting only malloc, which enables enforcement of per-allocation spatial safety for heap-allocated objects for existing binaries. When combined with simple intra-procedural compiler instrumentation, hardware bounded pointers enable a low-overhead approach for enforcing complete spatial memory safety in unmodified C programs
- …