21,137 research outputs found

    Accounting for Memory Bank Contention and Delay in High-Bandwidth Multiprocessors

    Get PDF
    This paper considers issues of memory performance in shared memory multiprocessors that provide a high-bandwidth network and in which the memory banks are slower than the processors. We are concerned with the effects of memory bank contention, memory bank delay, and the bank expansion factor (the ratio of number of banks to number of processors) on performance, particularly for irregular memory access patterns. This work was motivated by observed discrepancies between predicted and actual performance in a number of irregular algorithms implemented for the cray C90 when the memory contention at a particular location is high. We develop a formal framework for studying memory bank contention and delay, and show several results, both experimental and theoretical. We first show experimentally that our framework is a good predictor of performance on the cray C90 and J90, providing a good accounting of bank contention and delay. Second, we show that it often improves performance to have addi..

    Contention-Free Complexity of Shared Memory Algorithms

    Get PDF
    AbstractWorst-case time complexity is a measure of the maximum time needed to solve a problem over all runs. Contention-free time complexity indicates the maximum time needed when a process executes by itself, without competition from other processes. Since contention is rare in well-designed systems, it is important to design algorithms which perform well in the absence of contention. We study the contention-free time complexity of shared memory algorithms using two measures: step complexity, which counts the number of accesses to shared registers; and register complexity, which measures the number of different registers accessed. Depending on the system architecture, one of the two measures more accurately reflects the elapsed time. We provide lower and upper bounds for the contention-free step and register complexity of solving the mutual exclusion problem as a function of the number of processes and the size of the largest register that can be accessed in one atomic step. We also present bounds on the worst-case and contention-free step and register complexities of solving the naming problem. These bounds illustrate that the proposed complexity measures are useful in differentiating among the computational powers of different primitive

    Brief Announcement: Fast and Scalable Group Mutual Exclusion

    Get PDF
    The group mutual exclusion (GME) problem is a generalization of the classical mutual exclusion problem in which every critical section is associated with a type or session. Critical sections belonging to the same session can execute concurrently, whereas critical sections belonging to different sessions must be executed serially. The well-known read-write mutual exclusion problem is a special case of the group mutual exclusion problem. In a shared memory system, locks based on traditional mutual exclusion or its variants are commonly used to manage contention among processes. In concurrent algorithms based on fine-grained synchronization, a single lock is used to protect access to a small number of shared objects (e.g., a lock for every tree node) so as to minimize contention window. Evidently, a large number of shared objects in the system would translate into a large number of locks. Also, when fine-grained synchronization is used, most lock accesses are expected to be uncontended in practice. Most existing algorithms for the solving the GME problem have high space-complexity per lock. Further, all algorithms except for one have high step-complexity in the uncontented case. This makes them unsuitable for use in concurrent algorithms based on fine-grained synchronization. In this work, we present a novel GME algorithm for an asynchronous shared-memory system that has O(1) space-complexity per GME lock when the system contains a large number of GME locks as well as O(1) step-complexity when the system contains no conflicting requests

    a Fast Lock-Free Application Cache

    Get PDF
    When compared to blocking concurrency, non-blocking concurrency can provide higher performance in parallel shared-memory contexts, especially in high contention scenarios. This paper proposes FLeeC, an application-level cache system based on Memcached, which leverages re-designed data structures and non-blocking (or lock-free) concurrency to improve performance by allowing any number of concurrent writes and reads to its main data structures, even in high-contention scenarios. We discuss and evaluate its new algorithms, which allow a lock-free eviction policy and lock-free fast lookups. FLeeC can be used as a plug-in replacement for the original Memcached, and its new algorithms and concurrency control strategies result in considerable performance improvements (up to 6×).publishersversionpublishe

    Memory-processor co-scheduling in fixed priority systems

    Get PDF
    A major obstacle towards the adoption of multi-core platforms for real-time systems is given by the difficulties in characterizing the interference due to memory contention. The simple fact that multiple cores may simultaneously access shared memory and communication resources introduces a significant pessimism in the timing and schedulability analysis. To counter this problem, predictable execution models have been proposed splitting task executions into two consecutive phases: a memory phase in which the required instruction and data are pre-fetched to local memory (M-phase), and an execution phase in which the task is executed with no memory contention (C-phase). Decoupling memory and execution phases not only simplifies the timing analysis, but it also allows a more efficient (and predictable) pipelining of memory and execution phases through proper co-scheduling algorithms. In this paper, we take a further step towards the design of smart co-scheduling algorithms for sporadic real-time tasks complying with the M/C (memory-computation) model. We provide a theoretical framework that aims at tightly characterizing the schedulability improvement obtainable with the adopted M/C task model on a single-core systems. We identify a tight critical instant for M/C tasks scheduled with fixed priority, providing an exact response-time analysis with pseudo-polynomial complexity. We show in our experiments that a significant schedulability improvement may be obtained with respect to classic execution models, placing an important building block towards the design of more efficient partitioned multi-core systems
    • …
    corecore