133 research outputs found

    Managing contention of software transactional memory in real-time systems

    Get PDF
    The foreseen evolution of chip architectures to higher number of, heterogeneous, cores, with non-uniform memory and non-coherent caches, brings renewed attention to the use of Software Transactional Memory (STM) as an alternative to lock-based synchronisation. However, STM relies on the possibility of aborting conflicting transactions to maintain data consistency, which impacts on the responsiveness and timing guarantees required by real-time systems. In these systems, contention delays must be (efficiently) limited so that the response times of tasks executing transactions are upperbounded and task sets can be feasibly scheduled. In this paper we defend the role of the transaction contention manager to reduce the number of transaction retries and to help the real-time scheduler assuring schedulability. For such purpose, the contention management policy should be aware of on-line scheduling information

    A Survey of Research into Mixed Criticality Systems

    Get PDF
    This survey covers research into mixed criticality systems that has been published since Vestal’s seminal paper in 2007, up until the end of 2016. The survey is organised along the lines of the major research areas within this topic. These include single processor analysis (including fixed priority and EDF scheduling, shared resources and static and synchronous scheduling), multiprocessor analysis, realistic models, and systems issues. The survey also explores the relationship between research into mixed criticality systems and other topics such as hard and soft time constraints, fault tolerant scheduling, hierarchical scheduling, cyber physical systems, probabilistic real-time systems, and industrial safety standards

    Mechanisms for Unbounded, Conflict-Robust Hardware Transactional Memory

    Get PDF
    Conventional lock implementations serialize access to critical sections guarded by the same lock, presenting programmers with a difficult tradeoff between granularity of synchronization and amount of parallelism realized. Recently, researchers have been investigating an emerging synchronization mechanism called transactional memory as an alternative to such conventional lock-based synchronization. Memory transactions have the semantics of executing in isolation from one another while in reality executing speculatively in parallel, aborting when necessary to maintain the appearance of isolation. This combination of coarse-grained isolation and optimistic parallelism has the potential to ease the tradeoff presented by lock-based programming. This dissertation studies the hardware implementation of transactional memory, making three main contributions. First, we propose the permissions-only cache, a mechanism that efficiently increases the size of transactions that can be handled in the local cache hierarchy to optimize performance. Second, we propose OneTM, an unbounded hardware transactional memory system that serializes transactions that escape the local cache hierarchy. Finally, we propose RetCon, a novel mechanism for detecting conflicts that reduces conflicts by allowing transactions to commit with different values than those with which they executed as long as dataflow and control-flow constraints are maintained

    Energy-Efficient Hardware-Accelerated Synchronization for Shared-L1-Memory Multiprocessor Clusters

    Get PDF
    The steeply growing performance demands for highly power- and energy-constrained processing systems such as end-nodes of the Internet-of-Things (IoT) have led to parallel near-threshold computing (NTC), joining the energy-efficiency benefits of low-voltage operation with the performance typical of parallel systems. Shared-L1-memory multiprocessor clusters are a promising architecture, delivering performance in the order of GOPS and over 100 GOPS/W of energy-efficiency. However, this level of computational efficiency can only be reached by maximizing the effective utilization of the processing elements (PEs) available in the clusters. Along with this effort, the optimization of PE-to-PE synchronization and communication is a critical factor for performance. In this article, we describe a light-weight hardware-accelerated synchronization and communication unit (SCU) for tightly-coupled clusters of processors. We detail the architecture, which enables fine-grain per-PE power management, and its integration into an eight-core cluster of RISC-V processors. To validate the effectiveness of the proposed solution, we implemented the eight-core cluster in advanced 22 nm FDX technology and evaluated performance and energy-efficiency with tunable microbenchmarks and a set of rea-life applications and kernels. The proposed solution allows synchronization-free regions as small as 42 cycles, over 41 smaller than the baseline implementation based on fast test-and-set access to L1 memory when constraining the microbenchmarks to 10 percent synchronization overhead. When evaluated on the real-life DSP-applications, the proposed SCU improves performance by up to 92 and 23 percent on average and energy efficiency by up to 98 and 39 percent on average

    Improving Responsiveness of Time-Sensitive Applications by Exploiting Dynamic Task Dependencies

    Get PDF
    In this paper, a mechanism is presented for reducing priority inversion in multi-programmed computing systems. Contrarily to well-known approaches from the literature, this paper tackles cases where the dependency relationships among tasks cannot be known in advance to the operating system (OS). The presented mechanism allows tasks to explicitly declare said relationships, enabling the OS scheduler to take advantage of such information and trigger priority inheritance, resulting in reduced priority inversion. We present the prototype implementation of the concept within the Linux kernel, in the form of modifications to the standard POSIX condition variables code, along with an extensive evaluation including a quantitative assessment of the benefits for applications making use of the technique, as well as comprehensive overhead measurements. Also, we present an associated technique for theoretical schedulability analysis of a system using the new mechanism, which is useful to determine whether all tasks can meet their deadlines or not, in the specific scenario of tasks interacting only through remote procedure calls, and under partitioned scheduling

    Speculative Barriers with Transactional Memory

    Get PDF
    Transactional Memory (TM) is a synchronization model for parallel programming which provides optimistic concurrency control. Transactions can run in parallel and are only serialized in case of conflict. In this work we use hardware TM (HTM) to implement an optimistic speculative barrier (SB) to replace the lock-based solution. SBs leverage HTM support to elide barriers speculatively. When a thread reaches an SB, a new SB transaction is started, keeping the updates private to the thread, and letting the HTM system detect potential conflicts. Once the last thread reaches the corresponding SB, the speculative threads can commit their changes. The main contributions of this work are: an API for SBs implemented with HTM extensions; a procedure to check the speculation state in between barriers to enable SBs with non-transactional codes; a HTM SB-aware conflict resolution enhancement where SB transactions stall on a conflict with a standard transaction; and a set of SB use guidelines derived from our experience on using SBs in a variety of applications. We evaluated our proposals in two different architectures with a full-system simulator and an IBM Power8 server. Results show an overall performance improvement of SBs over traditional barriers

    Multiprocessor-safe Wait-free Queue in RTSJ

    Get PDF
    Currently, most computer systems are running on multiprocessors (or multicores). Moreover, the number of cores inside the processor are expected to increase. To be able to utilise the increased computational power in these systems, developers are enforced to expose more parallelism within their applications. Multi-threading is one of the common techniques that are used to introduce parallelism within computer applications. Shared data structures are in the core of multi-threaded applications; these data structures facilitate the communication between the different threads to help in completing the designed tasks within the application. A control mechanism should be provided such that the access of any thread will not compromise the consistency and correctness of the data structure contents. The increased number of threads will result in an increased competition, this will lead to inevitable difficulties in understanding the interleaving scenarios at runtime, hence, the time analysis will be a very complex task. The Real-Time Specification for Java (RTSJ) introduces different shared queues that can facilitate communication between different threads within the application. However, these queues are uni-directional enabling communication between standard Java threads and the realtime thread classes, which the RTSJ introduces. The work presented in this thesis introduces a novel algorithm for concurrently accessing shared data structures in a shared memory multi-processor (or multi-core) systems. The proposed algorithm is implemented as an arraybased First-In-First-Out (FIFO) queue, which improves the scalability and time predictability in multi-threaded applications. The algorithm utilises the different features that the RTSJ introduces to ensure the time predictability

    Supporting Time-Based QoS Requirements in Software Transactional Memory

    Get PDF
    International audienceSoftware Transactional Memory (STM) is an optimistic concurrency control mechanism that simplifies parallel programming. Still, there has been little interest in its applicability for reactive applications in which there is a required response time for certain operations. We propose supporting such applications by allowing programmers to associate time with atomic blocks in the forms of deadlines and QoS requirements. Based on statistics of past executions, we adjust the execution mode of transactions by decreasing the level of optimism as the deadline approaches. In the presence of concurrent deadlines, we propose different conflict resolution policies. Execution mode switching mechanisms allow meeting multiple deadlines in a consistent manner, with potential QoS degradations being split fairly among several threads as contention increases, and avoiding starvation. Our implementation consists of extensions to a STM runtime that allow gathering statistics and switching execution modes. We also propose novel contention managers adapted to transactional workloads subject to deadlines. The experimental evaluation shows that our approaches significantly improve the likelihood of a transaction meeting its deadline and QoS requirement, even in cases where progress is hampered by conflicts and other concurrent transactions with deadlines
    corecore