8 research outputs found

    WCET Optimizations and Architectural Support for Hard Real-Time Systems

    Get PDF
    As time predictability is critical to hard real-time systems, it is not only necessary to accurately estimate the worst-case execution time (WCET) of the real-time tasks but also desirable to improve either the WCET of the tasks or time predictability of the system, because the real-time tasks with lower WCETs are easy to schedule and more likely to meat their deadlines. As a real-time system is an integration of software and hardware, the optimization can be achieved through two ways: software optimization and time-predictable architectural support. In terms of software optimization, we fi rst propose a loop-based instruction prefetching approach to further improve the WCET comparing with simple prefetching techniques such as Next-N-Line prefetching which can enhance both the average-case performance and the worst-case performance. Our prefetching approach can exploit the program controlow information to intelligently prefetch instructions that are most likely needed. Second, as inter-thread interferences in shared caches can signi cantly a ect the WCET of real-time tasks running on multicore processors, we study three multicore-aware code positioning methods to reduce the inter-core L2 cache interferences between co-running real-time threads. One strategy focuses on decreasing the longest WCET among the co-running threads, and two other methods aim at achieving fairness in terms of the amount or percentage of WCET reduction among co-running threads. In the aspect of time-predictable architectural support, we introduce the concept of architectural time predictability (ATP) to separate timing uncertainty concerns caused by hardware from software, which greatly facilitates the advancement of time-predictable processor design. We also propose a metric called Architectural Time-predictability Factor (ATF) to measure architectural time predictability quantitatively. Furthermore, while cache memories can generally improve average-case performance, they are harmful to time predictability and thus are not desirable for hard real-time and safety-critical systems. In contrast, Scratch-Pad Memories (SPMs) are time predictable, but they may lead to inferior performance. Guided by ATF, we propose and evaluate a variety of hybrid on-chip memory architectures to combine both caches and SPMs intelligently to achieve good time predictability and high performance. Detailed implementation and experimental results discussion are presented in this dissertation

    Exploring Hybrid SPM-Cache Architectures to Improve Performance and Energy Efficiency for Real-time Computing

    Get PDF
    Real-time computing is not just fast computing but time-predictable computing. Many tasks in safety-critical embedded real-time systems have hard real-time characteristics. Failure to meet deadlines may result in the loss of life or in large damages. Known of Worst Case Execution Time (WCET) is important for reliability or correct functional behavior of the system. As multi-core processors are increasingly adopted in industry, it has become a great challenge to accurately bound the worst-case execution time (WCET) for real-time systems running on multi-core chips. This is particularly true because of the inter-thread interferences in accessing shared resources on multi-cores, such as shared L2 caches, which can significantly affect the performance but are very difficult to be estimate statically. We propose an approach to analyzing Worst Case Execution Time (WCET) for multi-core processors with shared L2 instruction caches by using a model checking based method. Our experiments indicate that compared to the static analysis technique based on extended ILP (Integer Linear Programming), our approach improves the tightness of WCET estimation more than 31.1% for the benchmarks we studied. However, due to the inherent complexity of multi-core timing analysis and the state explosion problem, the model checking based approach currently can only work with small real-time kernels for dual-core processors. At the same time, improving the average-case performance and energy efficiency has also been important for real-time systems. Recently, Hybrid SPM-Cache (HSC) architectures by combining caches and Scratch-Pad Memories (SPMs) have been increasingly used in commercial processors and research prototypes. Our research explores HSC architectures for real-time systems to reconcile time predictability, performance, and energy consumption. We study the energy dissipation of a number of hybrid on-chip memory architectures by combining both caches and Scratch-Pad Memories (SPM) without increasing the total on-chip memory size. Our experimental results indicate that with the equivalent total on-chip memory size, several hybrid SPM-Cache architectures are more energy-efficient than either pure software controlled SPMs or pure hardware-controlled caches. In particular, using the hybrid SPM-cache to store both instructions and data can achieve the best energy efficiency. However, the SPM allocation for the HSC architecture must be aware of the cache to harness the full potential of the HSC architecture. First, we propose and evaluate four SPM allocation strategies to reduce WCET for hybrid SPM-Caches with different complexities. These algorithms differ by whether or not they can cooperate with the cache or be aware of the WCET. Our evaluation shows that the cache aware and WCET-oriented SPM allocation can maximally reduce the WCET with minimum or even positive impact on the average-case execution time (ACET). Moreover, we explore four SPM allocation algorithms to maximize performance on the HSC architecture, including three heuristic-based algorithms, and an optimal algorithm based on model checking. Our experiments indicate that the Greedy Stack Distance based Allocation (GSDA) can run efficiently while achieving performance either the same as or close to the optimal results got by the Optimal Stack Distance based Allocation (OSDA). Last but not the least, we extend the two stack distance based allocation algorithms to GSDA-E and OSDA-E to minimize the energy consumption of the HSC architecture. Our experimental results show that the GSDA-E can also reduce the energy either the same as or close to the optimal results attained by the OSDA-E, while achieving performance close to the OSDA and GSDA

    Design methodologies for instruction-set extensible processors

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Memory Optimizations for Time-Predictable Embedded Software

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Cache Related Pre-emption Delays in Embedded Real-Time Systems

    Get PDF
    Real-time systems are subject to stringent deadlines which make their temporal behaviour just as important as their functional behaviour. In multi-tasking real-time systems, the execution time of each task must be determined, and then combined together with information about the scheduling policy to ensure that there are enough resources to schedule all of the tasks. This is usually achieved by performing timing analysis on the individual tasks, and then schedulability analysis on the system as a whole. In systems with cache, multiple tasks can share this common resource which can lead to cache-related pre-emption delays (CRPD) being introduced. CRPD is the additional cost incurred from resuming a pre-empted task that no longer has the instructions or data it was using in cache, because the pre-empting task(s) evicted them from cache. It is therefore important to be able to account for CRPD when performing schedulability analysis. This thesis focuses on the effects of CRPD on a single processor system, further expanding our understanding of CRPD and ability to analyse and optimise for it. We present new CRPD analysis for Earliest Deadline First (EDF) scheduling that significantly outperforms existing analysis, and then perform the first comparison between Fixed Priority (FP) and EDF accounting for CRPD. In this comparison, we explore the effects of CRPD across a wide range of system and taskset parameters. We introduce a new task layout optimisation technique that maximises system schedulability via reduced CRPD. Finally, we extend CRPD analysis to hierarchical systems, allowing the effects of cache when scheduling multiple independent applications on a single processor to be analysed

    TIME-PREDICTABLE EXECUTION OF EMBEDDED SOFTWARE ON MULTI-CORE PLATFORMS

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    WCET code positioning

    No full text
    Some processors incur a pipeline delay whenever an instruction transfers control to a target that is not the next sequential instruction. Compiler writers attempt to reduce these delays by positioning the basic blocks within a function to minimize the number of unconditional jumps and taken conditional branches that occur. Such acode positioning algorithm is traditionally driven by profile data representing typical program executions where pairs of blocks are placed in contiguous order when the transitions between these blocks occur most frequently. Inthis paper we describe an approach to perform code positioning without profiling in an attempt to reduce WCET instead of ACET. Our compiler interacts with a timing analyzer to obtain WCET path information to guide the block positioning. The results show over a 9 % average reduction in WCET is achieved after code positioning is performed and our greedy WCET code positioning algorithm always achieves optimal results for our benchmark suite. 1

    Improving WCET by applying a WC code-positioning optimization

    No full text
    Applications in embedded systems often need to meet specified timing constraints. It is advantageous to not only calculate the Worst-Case Execution Time (WCET) of an application, but to also perform transformations which reduce the WCET since an application with a lower WCET will be less likely to violate its timing constraints. Some processors incur a pipeline delay whenever an instruction transfers control to a target that is not the next sequential instruction. Code positioning optimizations attempt to reduce these delays by positioning the basic blocks to minimize the number of unconditional jumps and taken conditional branches that occur. Traditional code positioning algorithms use profile data to find the frequently executed edges between basic blocks, then minimize the transfers of control along these edges to reduce the Average Case Execution Time (ACET). This paper introduces a WCET code positioning optimization, driven by the worst-case (WC) path information from a timing analyzer, to reduce the WCET instead of ACET. This WCET optimization changes the layout of the code in memory to reduce the branch penalties along the WC paths. Unlike the frequency of edges in traditional profile-driven code positioning, the WC path may change after code positioning decisions are made. Thus, WCET code positioning is inherently more challenging than ACET code positioning. The experimental results show that this optimization typically finds the optimal layout of the basic blocks with the minimal WCET. The results show over a 7 % reduction in WCET is achieved after code positioning is performed
    corecore