338 research outputs found

    Nested, but Separate: Isolating Unrelated Critical Sections in Real-Time Nested Locking

    Get PDF
    Prior work has produced multiprocessor real-time locking protocols that ensure asymptotically optimal bounds on priority inversion, that support fine-grained nesting of critical sections, or that are independence-preserving under clustered scheduling. However, while several protocols manage to come with two out of these three desirable features, no protocol to date accomplishes all three. Motivated by this gap in capabilities, this paper introduces the Group Independence-Preserving Protocol (GIPP), the first protocol to support fine-grained nested locking, guarantee a notion of independence preservation for fine-grained nested locking, and ensure asymptotically optimal priority-inversion bounds. As a stepping stone, this paper further presents the Clustered k-Exclusion Independence-Preserving Protocol (CKIP), the first asymptotically optimal independence-preserving k-exclusion lock for clustered scheduling. The GIPP and the CKIP rely on allocation inheritance (a.k.a. migratory priority inheritance) as a key mechanism to accomplish independence preservation

    Nested, but Separate: Isolating Unrelated Critical Sections in Real-Time Nested Locking

    Get PDF

    Using Lock Servers to Scale Real-Time Locking Protocols: Chasing Ever-Increasing Core Counts

    Get PDF
    During the past decade, parallelism-related issues have been at the forefront of real-time systems research due to the advent of multicore technologies. In the coming years, such issues will loom ever larger due to increasing core counts. Having more cores means a greater potential exists for platform capacity loss when the available parallelism cannot be fully exploited. In this paper, such capacity loss is considered in the context of real-time locking protocols. In this context, lock nesting becomes a key concern as it can result in transitive blocking chains that force tasks to execute sequentially unnecessarily. Such chains can be quite long on a larger machine. Contention-sensitive real-time locking protocols have been proposed as a means of "breaking" transitive blocking chains, but such protocols tend to have high overhead due to more complicated lock/unlock logic. To ease such overhead, the usage of lock servers is considered herein. In particular, four specific lock-server paradigms are proposed and many nuances concerning their deployment are explored. Experiments are presented that show that, by executing cache hot, lock servers can enable reductions in lock/unlock overhead of up to 86%. Such reductions make contention-sensitive protocols a viable approach in practice

    Sharing Non-Processor Resources in Multiprocessor Real-Time Systems

    Get PDF
    Computing devices are increasingly being leveraged in cyber-physical systems, in which computing devices sense, control, and interact with the physical world. Associated with many such real-world interactions are strict timing constraints, which if unsatisfied, can lead to catastrophic consequences. Modern examples of such timing constraints are prevalent in automotive systems, such as airbag controllers, anti-lock brakes, and new autonomous features. In all of these examples, a failure to correctly respond to an event in a timely fashion could lead to a crash, damage, injury and even loss of life. Systems with imperative timing constraints are called real-time systems, and are broadly the subject of this dissertation. Much previous work on real-time systems and scheduling theory assumes that computing tasks are independent, i.e., the only resource they share is the platform upon which they are executed. In practice, however, tasks share many resources, ranging from more overt resources such as shared memory objects, to less overt ones, including data buses and other hardware and I/O devices. Accesses to some such resources must be synchronized to ensure safety, i.e., logical correctness, while other resources may exhibit better run-time performance if accesses are explicitly synchronized. The goal of this dissertation was to develop new synchronization algorithms and associated analysis techniques that can be used to synchronize access to many classes of resources, while improving the overall resource utilization, specifically as measured by real-time schedulability. Towards that goal, the Real-Time Nested Locking Protocol (RNLP), the first multiprocessor real-time locking protocol that supports lock nesting or fine-grained locking is proposed and analyzed. Furthermore, the RNLP is extended to support reader/writer locking, as well as k-exclusion locking. All presented RNLP variants are proven optimal. Furthermore, experimental results demonstrate the schedulability-related benefits of the RNLP. Additionally, three new synchronization algorithms are presented, which are specifically motivated by the need to manage shared hardware resources to improve real-time predictability. Furthermore, two new classes of shared resources are defined, and the first synchronization algorithms for them are proposed. To analyze these new algorithms, a novel analysis technique called idleness analysis is presented, which can be used to incorporate the effects of blocking into schedulability analysis.Doctor of Philosoph

    Efficient Synchronization for Real-Time Systems with Nested Resource Access

    Get PDF
    Real-time systems are comprised of tasks, each of which must be guaranteed to meet its timing requirements. These tasks may request access to shared system components, called resources. Each such request may experience delays before being granted resource access. These delays can be separated into two categories: (i) those caused by the order in which tasks are granted resource access, and (ii) those caused by the time it takes to coordinate this ordering. If these delays are too large, a task may be unable to meet its timing requirements. Tasks can require access to multiple resources concurrently, acquiring these resources in a nested fashion. This nested resource access can cause significant delays to tasks; these delays can far exceed those when only a single resource is required at a time, as certain request orderings cause delays between tasks that do not share any resources. This dissertation presents locking protocols and a protocol-independent approach to mitigate resource access delays. Nested resource access can increase delays for all requests, including non-nested requests. The first protocol eliminates these additional delays by separating requests by type and creating a fast-path mechanism for non-nested requests. The next two protocols both reduce delays by reordering requests. One protocol reorders requests as they are issued, and the other uses an offline process to determine which requests may execute concurrently. These three protocols were compared to prior approaches in an evaluation across a range of task systems; all three protocols resulted in more task systems guaranteed to meet their timing requirements. Finally, a protocol-independent approach reduces delays by using a designated task to execute the locking protocol on behalf of other tasks. When applied to two protocol variants, this approach significantly reduced delays.Doctor of Philosoph

    Boosting performance of transactional memory through transactional read tracking and set associative locks

    Get PDF
    Multi-core processors have become so prevalent in server, desktop, and even embedded systems that they are considered the norm for modem computing systems. The trend is likely toward many-core processors with many more than just 2, 4, or 8 cores per CPU. To benefit from the increasing number of cores per chip, application developers have to develop parallel programs [1]. Traditional lock-based programming is too difficult and error prone for most of programmers and is the domain of experts. Deadlock, race, and other synchronization bugs are some of the challenges of lock-based programming. To make parallel programming mainstream, it is necessary to adapt parallel programming by the majority of programmers and not just experts, and thus simplifying parallel programming has become an important challenge. Transactional Memory (TM) is a promising programming model for managing concurrent accesses to the shared memory locations. Transactional memory allows a programmer to specify a section of a code to be "'transactional", and the underlying system guarantees atomic execution of the code. This simplifies parallel programming and reduces the possibility of synchronization bugs. This thesis develops several software- and hardware-based techniques to improve performance of existing transactional memory systems. The first technique is Transactional Read Tracking (TRT). TRT is a software-based approach that employs a locking mechanism for transactional read and write operations. The performance of TRT depends on memory access patterns of applications. In some cases, TRT falls behind the baseline scheme. To further improve performance of TRT, we introduce two hybrid methods that dynamically switches between TRT and the baseline scheme based on applications’ behavior. The second optimization technique is Set Associative Lock (SAL). Memory locations are mapped to a lock table in order to synchronize accesses to the shared memory locations. Direct mapped lock tables usually result in collision which leads to false aborts. In SAL, we increase associativity of the lock table to reduce false abort. While SAL improves performance in most of the applications, in some cases, it increases execution time due to overhead of lock tables in software. To cope with this problem, we propose Hardware-SAL (HW-SAL) which moves the set associative lock table to the hardware. As such, true power of set associativity will be harnessed without sacrificing performance

    Multicore XINU

    Get PDF
    Multicore architectures employ multiple processing cores that work together for greater processing power. Shared memory, symmetric multiprocessor (SMP) systems are ubiquitous. All software must be explicitly designed to support SMP processing, including operating systems. XINU is a simple, lightweight, elegant operating system that has existed for several decades and has been ported to many platforms. However, XINU has never been extended to support multicore processing. This project incrementally adds SMP support to the XINU operating system. Core kernel modules, including the scheduler and memory manager, have been successfully extended without overhauling or completely redesigning XINU. A multicore methodology is laid out for the remaining kernel modules

    Reducing consistency traffic and cache misses in the avalanche multiprocessor

    Get PDF
    Journal ArticleFor a parallel architecture to scale effectively, communication latency between processors must be avoided. We have found that the source of a large number of avoidable cache misses is the use of hardwired write-invalidate coherency protocols, which often exhibit high cache miss rates due to excessive invalidations and subsequent reloading of shared data. In the Avalanche project at the University of Utah, we are building a 64-node multiprocessor designed to reduce the end-to-end communication latency of both shared memory and message passing programs. As part of our design efforts, we are evaluating the potential performance benefits and implementation complexity of providing hardware support for multiple coherency protocols. Using a detailed architecture simulation of Avalanche, we have found that support for multiple consistency protocols can reduce the time parallel applications spend stalled on memory operations by up to 66% and overall execution time by up to 31%. Most of this reduction in memory stall time is due to a novel release-consistent multiple-writer write-update protocol implemented using a write state buffer
    • …
    corecore