754 research outputs found

    Resource-aware scheduling for 2D/3D multi-/many-core processor-memory systems

    Get PDF
    This dissertation addresses the complexities of 2D/3D multi-/many-core processor-memory systems, focusing on two key areas: enhancing timing predictability in real-time multi-core processors and optimizing performance within thermal constraints. The integration of an increasing number of transistors into compact chip designs, while boosting computational capacity, presents challenges in resource contention and thermal management. The first part of the thesis improves timing predictability. We enhance shared cache interference analysis for set-associative caches, advancing the calculation of Worst-Case Execution Time (WCET). This development enables accurate assessment of cache interference and the effectiveness of partitioned schedulers in real-world scenarios. We introduce TCPS, a novel task and cache-aware partitioned scheduler that optimizes cache partitioning based on task-specific WCET sensitivity, leading to improved schedulability and predictability. Our research explores various cache and scheduling configurations, providing insights into their performance trade-offs. The second part focuses on thermal management in 2D/3D many-core systems. Recognizing the limitations of Dynamic Voltage and Frequency Scaling (DVFS) in S-NUCA many-core processors, we propose synchronous thread migrations as a thermal management strategy. This approach culminates in the HotPotato scheduler, which balances performance and thermal safety. We also introduce 3D-TTP, a transient temperature-aware power budgeting strategy for 3D-stacked systems, reducing the need for Dynamic Thermal Management (DTM) activation. Finally, we present 3QUTM, a novel method for 3D-stacked systems that combines core DVFS and memory bank Low Power Modes with a learning algorithm, optimizing response times within thermal limits. This research contributes significantly to enhancing performance and thermal management in advanced processor-memory systems

    Memory-Aware Scheduling for Fixed Priority Hard Real-Time Computing Systems

    Get PDF
    As a major component of a computing system, memory has been a key performance and power consumption bottleneck in computer system design. While processor speeds have been kept rising dramatically, the overall computing performance improvement of the entire system is limited by how fast the memory can feed instructions/data to processing units (i.e. so-called memory wall problem). The increasing transistor density and surging access demands from a rapidly growing number of processing cores also significantly elevated the power consumption of the memory system. In addition, the interference of memory access from different applications and processing cores significantly degrade the computation predictability, which is essential to ensure timing specifications in real-time system design. The recent IC technologies (such as 3D-IC technology) and emerging data-intensive real-time applications (such as Virtual Reality/Augmented Reality, Artificial Intelligence, Internet of Things) further amplify these challenges. We believe that it is not simply desirable but necessary to adopt a joint CPU/Memory resource management framework to deal with these grave challenges. In this dissertation, we focus on studying how to schedule fixed-priority hard real-time tasks with memory impacts taken into considerations. We target on the fixed-priority real-time scheduling scheme since this is one of the most commonly used strategies for practical real-time applications. Specifically, we first develop an approach that takes into consideration not only the execution time variations with cache allocations but also the task period relationship, showing a significant improvement in the feasibility of the system. We further study the problem of how to guarantee timing constraints for hard real-time systems under CPU and memory thermal constraints. We first study the problem under an architecture model with a single core and its main memory individually packaged. We develop a thermal model that can capture the thermal interaction between the processor and memory, and incorporate the periodic resource sever model into our scheduling framework to guarantee both the timing and thermal constraints. We further extend our research to the multi-core architectures with processing cores and memory devices integrated into a single 3D platform. To our best knowledge, this is the first research that can guarantee hard deadline constraints for real-time tasks under temperature constraints for both processing cores and memory devices. Extensive simulation results demonstrate that our proposed scheduling can improve significantly the feasibility of hard real-time systems under thermal constraints

    A Survey of Prediction and Classification Techniques in Multicore Processor Systems

    Get PDF
    In multicore processor systems, being able to accurately predict the future provides new optimization opportunities, which otherwise could not be exploited. For example, an oracle able to predict a certain application\u27s behavior running on a smart phone could direct the power manager to switch to appropriate dynamic voltage and frequency scaling modes that would guarantee minimum levels of desired performance while saving energy consumption and thereby prolonging battery life. Using predictions enables systems to become proactive rather than continue to operate in a reactive manner. This prediction-based proactive approach has become increasingly popular in the design and optimization of integrated circuits and of multicore processor systems. Prediction transforms from simple forecasting to sophisticated machine learning based prediction and classification that learns from existing data, employs data mining, and predicts future behavior. This can be exploited by novel optimization techniques that can span across all layers of the computing stack. In this survey paper, we present a discussion of the most popular techniques on prediction and classification in the general context of computing systems with emphasis on multicore processors. The paper is far from comprehensive, but, it will help the reader interested in employing prediction in optimization of multicore processor systems

    CoMeT: An Integrated Interval Thermal Simulation Toolchain for 2D, 2.5 D, and 3D Processor-Memory Systems

    Get PDF
    Processing cores and the accompanying main memory working in tandem enable the modern processors. Dissipating heat produced from computation, memory access remains a significant problem for processors. Therefore, processor thermal management continues to be an active research topic. Most thermal management research takes place using simulations, given the challenges of measuring temperature in real processors. Since core and memory are fabricated on separate packages in most existing processors, with the memory having lower power densities, thermal management research in processors has primarily focused on the cores. Memory bandwidth limitations associated with 2D processors lead to high-density 2.5D and 3D packaging technology. 2.5D packaging places cores and memory on the same package. 3D packaging technology takes it further by stacking layers of memory on the top of cores themselves. Such packagings significantly increase the power density, making processors prone to heating. Therefore, mitigating thermal issues in high-density processors (packaged with stacked memory) becomes an even more pressing problem. However, given the lack of thermal modeling for memories in existing interval thermal simulation toolchains, they are unsuitable for studying thermal management for high-density processors. To address this issue, we present CoMeT, the first integrated Core and Memory interval Thermal simulation toolchain. CoMeT comprehensively supports thermal simulation of high- and low-density processors corresponding to four different core-memory configurations - off-chip DDR memory, off-chip 3D memory, 2.5D, and 3D. CoMeT supports several novel features that facilitate overlying system research. Compared to an equivalent state-of-the-art core-only toolchain, CoMeT adds only a ~5% simulation-time overhead. The source code of CoMeT has been made open for public use under the MIT license.Comment: https://github.com/marg-tools/CoMe

    Hardware/Software Co-design for Multicore Architectures

    Get PDF
    Siirretty Doriast

    DYNAMIC THERMAL MANAGEMENT FOR MICROPROCESSORS THROUGH TASK SCHEDULING

    Get PDF
    With continuous IC(Integrated Circuit) technology size scaling, more and more transistors are integrated in a tiny area of the processor. Microprocessors experience unprecedented high power and high temperatures on chip, which can easily violate the thermal constraint. High temperature on the chip, if not controlled, can damage or even burn the chip. There are also emerging technologies which can exacerbate the thermal condition on modern processors. For example, 3D stacking is an IC technology that stacks several die layers together, in order to shorten the communication path between the dies to improve the chip performance. This technology unfortunately increases the power density per unit volumn, and the heat from each layer needs to dissipate vertically through the same heat sink. Another example is chip multi-processor. A chip multi-processor(CMP) integrates two or more independent actual processors (called “cores”), onto a single integrated circuit die. As IC technology nodes continually scale down to 45nm and below, there is significant within-die process variation(PV) in the current and near-future CMPs. Process variation makes the cores in the chip differ in their maximum operable frequency, and the amount of leakage power they consume. This can result in the immense spatial variation of the temperatures of the cores on the same chip, which means the temperatures of some cores can be much higher than other cores. One of the most commonly used methods to constrain a CPU from overheating is hardware dynamic thermal management(HW DTM), due to the high cost and inefficiency of current mechanical cooling techniques. Dynamic voltage/frequency scaling(DVFS) is such a broad-spectrum dynamic thermal management technique that can be applied to all types of processors, so we adopt DVFS as the HW DTM method in this thesis to simplify problem discussion. DVFS lowers the CPU power consumption by reducing CPU frequency or voltage when temperature overshoots, which constrains the temperature at the price of performance loss, in terms of reduced CPU throughput, or longer execution time of the programs. This thesis mainly addresses this problem, with the goal of eliminating unnecessary hardware-level DVFS and improving chip performance. The methodology of the experiments in this thesis are based on the accurate estimation of power and temperature on the processor. The CPU power usage of different benchmarks are estimated by reading the performance counters on a real P4 chip, and measuring the activities of different CPU functional units. The jobs are then categorized into powerintensive(hot) ones and power non-intensive(cool) ones. Many combinations of the jobs with mixed power(thermal) characteristics are used to evaluate the effectiveness of the algorithms we propose. When the experiments are conducted on a single-core processor, a compact dynamic thermal model embedded in Linux kernel is used to calculate the CPU temperature. When the experiments are conducted on the CMP with 3D stacked dies, or the CMP affected by significant process variation, a thermal simulation tool well recognized in academia is used. The contribution of the thesis is that it proposes new software-level task scheduling algorithms to avoid unnecessary hardware-level DVFS. New task scheduling algorithms are proposed not only for the single-core processor, but aslo for the CMP with 3D stacked dies, and the CMP under process variation. Compared with the state-of-the-art algorithms proposed by other researchers, the new algorithms we propose all show significant performance improvement. To improve the performance of the single-core processors, which is harmed by the thermal overshoots and the HW DTMs, we propose a heuristic algorithm named ThreshHot, which judiciously schedules hot jobs before cool jobs, to make the future temperature lower. Furthermore, it always makes the temperature stay as close to the threshold as possible while not overshooting. In the CMPs with 3D stacked dies, three heuristics are proposed and combined as one algorithm. First, the vertically stacked cores are treated as a core stack. The power of jobs is balanced among the core stacks instead of the individual cores. Second, the hot jobs are moved close to the heat sink to expedite heat dissipation. Third, when the thermal emergencies happen, the most power-intensive job in a core stack is penalized in order to lower the temperature quickly. When CMPs are under significant process variation, each core on the CMP has distinct maximum frequency and leakage power. Maximizing the overall CPU throughput on all the cores is in conflict with satisfying on-chip thermal constraints imposed on each core. A maximum bipartite matching algorithm is used to solve this dilemma, to exploit the maximum performance of the chip
    • …
    corecore