68 research outputs found

    Memory Optimizations for Time-Predictable Embedded Software

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Processor Energy Characterization for Compiler-Assisted Software Energy Reduction

    Get PDF

    Power-Efficient and Low-Latency Memory Access for CMP Systems with Heterogeneous Scratchpad On-Chip Memory

    Get PDF
    The gradually widening speed disparity of between CPU and memory has become an overwhelming bottleneck for the development of Chip Multiprocessor (CMP) systems. In addition, increasing penalties caused by frequent on-chip memory accesses have raised critical challenges in delivering high memory access performance with tight power and latency budgets. To overcome the daunting memory wall and energy wall issues, this thesis focuses on proposing a new heterogeneous scratchpad memory architecture which is configured from SRAM, MRAM, and Z-RAM. Based on this architecture, we propose two algorithms, a dynamic programming and a genetic algorithm, to perform data allocation to different memory units, therefore reducing memory access cost in terms of power consumption and latency. Extensive and intensive experiments are performed to show the merits of the heterogeneous scratchpad architecture over the traditional pure memory system and the effectiveness of the proposed algorithms

    Schedulability-driven scratchpad memory swapping for resource-constrained real-time embedded systems

    Get PDF
    In resource-constrained real-time embedded systems, scratchpad memory (SPM) is utilized in place of cache to increase performance and enforce consistent behavior of both hard and soft real-time tasks via software-controlled SPM management techniques (SPMMTs). Real-time systems depend on time critical (hard) tasks to complete execution before their deadline times. Many real-time systems also depend on the execution of soft tasks that do not have to complete by hard deadlines. This thesis evaluates a new SPMMT that increases both worst-case task slack time (TST) and soft task processing capabilities, by combining two existing SPMMTs. The schedulability-driven ACETRB / WCETRB swapping (SDAWS) SPMMT of this thesis uses task schedulability characteristics to control the selection of either the average-case execution time reduction based (ACETRB) SPMMT or the worst-case execution time reduction based (WCETRB) SPMMT. While the literature contains examples of combined management techniques, until now there have been none that combine both WCETRB and ACETRB SPMMTs. The advantage of combining them is to achieve WCET reduction comparable to what can be achieved with the WCETRB SPMMT, while achieving significantly reduced ACET relative to the WCETRB SPMMT. Using a stripped-down RTOS and an SPMMT simulator implemented for this work, evaluated resource-constrained scenarios show a reduction in task slack time from the SDAWS SPMMT relative to the WCETRB SPMMT between 20% and 45%. However, the evaluated scenarios also conservatively show that SDAWS can reduce ACET relative to the WCETRB SPMMT by up to 60%

    CROSS-LAYER CUSTOMIZATION PLATFORM FOR LOW-POWER AND REAL-TIME EMBEDDED APPLICATIONS

    Get PDF
    Modern embedded applications have become increasingly complex and diverse in their functionalities and requirements. Data processing, communication and multimedia signal processing, real-time control and various other functionalities can often need to be implemented on the same System-on-Chip(SOC) platform. The significant power constraints and real-time guarantee requirements of these applications have become significant obstacles for the traditional embedded system design methodologies. The general-purpose computing microarchitectures of these platforms are designed to achieve good performance on average, which is far from optimal for any particular application. The system must always assume worst-case scenarios, which results in significant power inefficiencies and resource under-utilization. This dissertation introduces a cross-layer application-customizable embedded platform, which dynamically exploits application information and fine-tunes system components at system software and hardware layers. This is achieved with the close cooperation and seamless integration of the compiler, the operating system, and the hardware architecture. The compiler is responsible for extracting application regularities through static and profile-based analysis. The relevant application knowledge is propagated and utilized at run-time across the system layers through the judiciously introduced reconfigurability at both OS and hardware layers. The introduced framework comprehensively covers the fundamental subsystems of memory management and multi-tasking execution control

    Resource-aware scheduling for 2D/3D multi-/many-core processor-memory systems

    Get PDF
    This dissertation addresses the complexities of 2D/3D multi-/many-core processor-memory systems, focusing on two key areas: enhancing timing predictability in real-time multi-core processors and optimizing performance within thermal constraints. The integration of an increasing number of transistors into compact chip designs, while boosting computational capacity, presents challenges in resource contention and thermal management. The first part of the thesis improves timing predictability. We enhance shared cache interference analysis for set-associative caches, advancing the calculation of Worst-Case Execution Time (WCET). This development enables accurate assessment of cache interference and the effectiveness of partitioned schedulers in real-world scenarios. We introduce TCPS, a novel task and cache-aware partitioned scheduler that optimizes cache partitioning based on task-specific WCET sensitivity, leading to improved schedulability and predictability. Our research explores various cache and scheduling configurations, providing insights into their performance trade-offs. The second part focuses on thermal management in 2D/3D many-core systems. Recognizing the limitations of Dynamic Voltage and Frequency Scaling (DVFS) in S-NUCA many-core processors, we propose synchronous thread migrations as a thermal management strategy. This approach culminates in the HotPotato scheduler, which balances performance and thermal safety. We also introduce 3D-TTP, a transient temperature-aware power budgeting strategy for 3D-stacked systems, reducing the need for Dynamic Thermal Management (DTM) activation. Finally, we present 3QUTM, a novel method for 3D-stacked systems that combines core DVFS and memory bank Low Power Modes with a learning algorithm, optimizing response times within thermal limits. This research contributes significantly to enhancing performance and thermal management in advanced processor-memory systems

    Multi-resource management in embedded real-time systems

    Get PDF
    This thesis addresses the problem of online multi-resource management in embedded real-time systems. It focuses on three research questions. The first question concentrates on how to design an efficient hierarchical scheduling framework for supporting independent development and analysis of component based systems, to provide temporal isolation between components. The second question investigates how to change the mapping of resources to tasks and components during run-time efficiently and predictably, and how to analyze the latency of such a system mode change in systems comprised of several scalable components. The third question deals with the scheduling and analysis of a set of parallel-tasks with real-time constraints which require simultaneous access to several different resources. For providing temporal isolation we chose a reservation-based approach. We first focused on processor reservations, where timed events play an important role. Common examples are task deadlines, periodic release of tasks, budget replenishment and budget depletion. Efficient timer management is therefore essential. We investigated the overheads in traditional timer management techniques and presented a mechanism called Relative Timed Event Queues (RELTEQ), which provides an expressive set of primitives at a low processor and memory overhead. We then leveraged RELTEQ to create an efficient, modular and extensible design for enhancing a real-time operating system with periodic tasks, polling, idling periodic and deferrable servers, and a two-level fixed-priority Hierarchical Scheduling Framework (HSF). The HSF design provides temporal isolation and supports independent development of components by separating the global and local scheduling, and allowing each server to define a dedicated scheduler. Furthermore, the design addresses the system overheads inherent to an HSF and prevents undesirable interference between components. It limits the interference of inactive servers on the system level by means of wakeup events and a combination of inactive server queues with a stopwatch queue. Our implementation is modular and requires only a few modifications of the underlying operating system. We then investigated scalable components operating in a memory-constrained system. We first showed how to reduce the memory requirements in a streaming multimedia application, based on a particular priority assignment of the different components along the processing chain. Then we investigated adapting the resource provisions to tasks during runtime, referred to as mode changes. We presented a novel mode change protocol called Swift Mode Changes, which relies on Fixed Priority with Deferred preemption Scheduling to reduce the mode change latency bound compared to existing protocols based on Fixed Priority Preemptive Scheduling. We then presented a new partitioned parallel-task scheduling algorithm called Parallel-SRP (PSRP), which generalizes MSRP for multiprocessors, and the corresponding schedulability analysis for the problem of multi-resource scheduling of parallel tasks with real-time constraints. We showed that the algorithm is deadlock-free, derived a maximum bound on blocking, and used this bound as a basis for a schedulability test. We then demonstrated how PSRP can exploit the inherent parallelism of a platform comprised of multiple heterogeneous resources. Finally, we presented Grasp, which is a visualization toolset aiming to provide insight into the behavior of complex real-time systems. Its flexible plugin infrastructure allows for easy extension with custom visualization and analysis techniques for automatic trace verification. Its capabilities include the visualization of hierarchical multiprocessor systems, including partitioned and global multiprocessor scheduling with migrating tasks and jobs, communication between jobs via shared memory and message passing, and hierarchical scheduling in combination with multiprocessor scheduling. For tracing distributed systems with asynchronous local clocks Grasp also supports the synchronization of traces from different processors during the visualization and analysis

    Memory-Aware Scheduling for Fixed Priority Hard Real-Time Computing Systems

    Get PDF
    As a major component of a computing system, memory has been a key performance and power consumption bottleneck in computer system design. While processor speeds have been kept rising dramatically, the overall computing performance improvement of the entire system is limited by how fast the memory can feed instructions/data to processing units (i.e. so-called memory wall problem). The increasing transistor density and surging access demands from a rapidly growing number of processing cores also significantly elevated the power consumption of the memory system. In addition, the interference of memory access from different applications and processing cores significantly degrade the computation predictability, which is essential to ensure timing specifications in real-time system design. The recent IC technologies (such as 3D-IC technology) and emerging data-intensive real-time applications (such as Virtual Reality/Augmented Reality, Artificial Intelligence, Internet of Things) further amplify these challenges. We believe that it is not simply desirable but necessary to adopt a joint CPU/Memory resource management framework to deal with these grave challenges. In this dissertation, we focus on studying how to schedule fixed-priority hard real-time tasks with memory impacts taken into considerations. We target on the fixed-priority real-time scheduling scheme since this is one of the most commonly used strategies for practical real-time applications. Specifically, we first develop an approach that takes into consideration not only the execution time variations with cache allocations but also the task period relationship, showing a significant improvement in the feasibility of the system. We further study the problem of how to guarantee timing constraints for hard real-time systems under CPU and memory thermal constraints. We first study the problem under an architecture model with a single core and its main memory individually packaged. We develop a thermal model that can capture the thermal interaction between the processor and memory, and incorporate the periodic resource sever model into our scheduling framework to guarantee both the timing and thermal constraints. We further extend our research to the multi-core architectures with processing cores and memory devices integrated into a single 3D platform. To our best knowledge, this is the first research that can guarantee hard deadline constraints for real-time tasks under temperature constraints for both processing cores and memory devices. Extensive simulation results demonstrate that our proposed scheduling can improve significantly the feasibility of hard real-time systems under thermal constraints
    corecore