5,427 research outputs found

    Response Time Analysis for Thermal-Aware Real-Time Systems Under Fixed-Priority Scheduling

    Get PDF
    International audienceThis paper investigates schedulability analysis for thermal-aware real-time systems. Thermal constraints are becoming more and more critical in new generation miniaturized embedded systems, e.g. Medicals implants. As part of this work, we adapt the PFPasap algorithm proposed in [1] for energy-harvesting systems to thermal-aware ones. We prove its optimality for non-concrete1 fixed-priority task sets and propose a response-time analysis based on worst-case response-time upper bounds. We evaluate the efficacy of the proposed bounds via extensive simulation over randomly-generated task systems

    A Control-Theoretic Design And Analysis Framework For Resilient Hard Real-Time Systems

    Get PDF
    We introduce a new design metric called system-resiliency which characterizes the maximum unpredictable external stresses that any hard-real-time performance mode can withstand. Our proposed systemresiliency framework addresses resiliency determination for real-time systems with physical and hardware limitations. Furthermore, our framework advises the system designer about the feasible trade-offs between external system resources for the system operating modes on a real-time system that operates in a multi-parametric resiliency environment. Modern multi-modal real-time systems degrade the system’s operational modes as a response to unpredictable external stimuli. During these mode transitions, real-time systems should demonstrate a reliable and graceful degradation of service. Many control-theoretic-based system design approaches exist. Although they permit real-time systems to operate under various physical constraints, none of them allows the system designer to predict the system-resiliency over multi-constrained operating environment. Our framework fills this gap; the proposed framework consists of two components: the design-phase and runtime control. With the design-phase analysis, the designer predicts the behavior of the real-time system for variable external conditions. Also, the runtime controller navigates the system to the best desired target using advanced control-theoretic techniques. Further, our framework addresses the system resiliency of both uniprocessor and multicore processor systems. As a proof of concept, we first introduce a design metric called thermal-resiliency, which characterizes the maximum external thermal stress that any hard-real-time performance mode can withstand. We verify the thermal-resiliency for the external thermal stresses on a uniprocessor system through a physical testbed. We show how to solve some of the issues and challenges of designing predictable real-time systems that guarantee hard deadlines even under transitions between modes in an unpredictable thermal environment where environmental temperature may dynamically change using our new metric. We extend the derivation of thermal-resiliency to multicore systems and determine the limitations of external thermal stress that any hard-real-time performance mode can withstand. Our control-theoretic framework allows the system designer to allocate asymmetric processing resources upon a multicore proiii cessor and still maintain thermal constraints. In addition, we develop real-time-scheduling sub-components that are necessary to fully implement our framework; toward this goal, we investigate the potential utility of parallelization for meeting real-time constraints and minimizing energy. Under malleable gang scheduling of implicit-deadline sporadic tasks upon multiprocessors, we show the non-necessity of dynamic voltage/frequency regarding optimality of our scheduling problem. We adapt the canonical schedule for DVFS multiprocessor platforms and propose a polynomial-time optimal processor/frequency-selection algorithm. Finally, we verify the correctness of our framework through multiple measurable physical and hardware constraints and complete our work on developing a generalized framework

    Thermal Implications of Energy-Saving Schedulers

    Get PDF

    Dynamic Thermal Management for Microprocessors

    Get PDF
    In deep submicron era, thermal hot spots and large temperature gradients significantly impact system reliability, performance, cost and leakage power. Dynamic thermal management techniques are designed to tackle the problems and control the chip temperature as well as power consumption. They refer to those techniques which enable the chip to autonomously modify the task execution and power dissipation characteristics so that lower-cost cooling solutions could be adopted while still guaranteeing safe temperature regulation. As long as the temperature is regulated, the system reliability can be improved, leakage power can be reduced and cooling system lifetime can be extended significantly. Multimedia applications are expected to form the largest portion of workload in general purpose PC and portable devices. The ever-increasing computation intensity of multimedia applications elevates the processor temperature and consequently impairs the reliability and performance of the system. In this thesis, we propose to perform dynamic thermal management using reinforcement learning algorithm for multimedia applications. The presented learning model does not need any prior knowledge of the workload information or the system thermal and power characteristics. It learns the temperature change and workload switching patterns by observing the temperature sensor and event counters on the processor, and finds the management policy that provides good performance-thermal tradeoff during the runtime. As the system complexity increases, it is more and more difficult to perform thermal management in a centralized manner because of state explosion and the overhead of monitoring the entire chip. In this thesis, we present a framework for distributed thermal management in many-core systems where balanced thermal profile can be achieved by proactive task migration among neighboring cores. The framework has a low cost agent residing in each core that observes the local workload and temperature and communicates with its nearest neighbor for task migration and exchange. By choosing only those migration requests that will result in balanced workload without generating thermal emergency, the presented framework maintains workload balance across the system and avoids unnecessary migration. Experimental results show that, our distributed management policy achieves almost the same performance as a global management policy when the tasks are initially randomly distributed. Compared with existing proactive task migration technique, our approach generates less hotspot, less migration overhead with negligible performance overhead. Temperature affects the leakage power and cooling power. In this thesis, we address the impact of task allocation on a processor\u27s leakage power and cooling fan power. Although the leakage power is determined by the average die temperature and the fan power is determined by the peak temperature, our analysis shows that the overall power can be minimized if a task allocation with minimum peak temperature is adopted together with an intelligent fan speed adjustment technique that finds the optimal tradeoff between fan power and leakage power. We further present a multi-agent distributed task migration technique that searches for the best task allocation during runtime. By choosing only those migration requests that will result chip maximum temperature reduction, the presented framework achieves large fan power savings as well as overall power reduction

    A rolling horizon approach for optimal management of microgrids under stochastic uncertainty

    Get PDF
    This work presents a Mixed Integer Linear Programming (MILP) approach based on a combination of a rolling horizon and stochastic programming formulation. The objective of the proposed formulation is the optimal management of the supply and demand of energy and heat in microgrids under uncertainty, in order to minimise the operational cost. Delays in the starting time of energy demands are allowed within a predefined time windows to tackle flexible demand profiles. This approach uses a scenario-based stochastic programming formulation. These scenarios consider uncertainty in the wind speed forecast, the processing time of the energy tasks and the overall heat demand, to take into account all possible scenarios related to the generation and demand of energy and heat. Nevertheless, embracing all external scenarios associated with wind speed prediction makes their consideration computationally intractable. Thus, updating input information (e.g., wind speed forecast) is required to guarantee good quality and practical solutions. Hence, the two-stage stochastic MILP formulation is introduced into a rolling horizon approach that periodically updates input information

    Holistic Virtual Machine Scheduling in Cloud Datacenters towards Minimizing Total Energy

    Get PDF
    Energy consumed by Cloud datacenters has dramatically increased, driven by rapid uptake of applications and services globally provisioned through virtualization. By applying energy-aware virtual machine scheduling, Cloud providers are able to achieve enhanced energy efficiency and reduced operation cost. Energy consumption of datacenters consists of computing energy and cooling energy. However, due to the complexity of energy and thermal modeling of realistic Cloud datacenter operation, traditional approaches are unable to provide a comprehensive in-depth solution for virtual machine scheduling which encompasses both computing and cooling energy. This paper addresses this challenge by presenting an elaborate thermal model that analyzes the temperature distribution of airflow and server CPU. We propose GRANITE – a holistic virtual machine scheduling algorithm capable of minimizing total datacenter energy consumption. The algorithm is evaluated against other existing workload scheduling algorithms MaxUtil, TASA, IQR and Random using real Cloud workload characteristics extracted from Google datacenter tracelog. Results demonstrate that GRANITE consumes 4.3% - 43.6% less total energy in comparison to the state-of-the-art, and reduces the probability of critical temperature violation by 99.2% with 0.17% SLA violation rate as the performance penalty

    Improving efficiency and resilience in large-scale computing systems through analytics and data-driven management

    Full text link
    Applications running in large-scale computing systems such as high performance computing (HPC) or cloud data centers are essential to many aspects of modern society, from weather forecasting to financial services. As the number and size of data centers increase with the growing computing demand, scalable and efficient management becomes crucial. However, data center management is a challenging task due to the complex interactions between applications, middleware, and hardware layers such as processors, network, and cooling units. This thesis claims that to improve robustness and efficiency of large-scale computing systems, significantly higher levels of automated support than what is available in today's systems are needed, and this automation should leverage the data continuously collected from various system layers. Towards this claim, we propose novel methodologies to automatically diagnose the root causes of performance and configuration problems and to improve efficiency through data-driven system management. We first propose a framework to diagnose software and hardware anomalies that cause undesired performance variations in large-scale computing systems. We show that by training machine learning models on resource usage and performance data collected from servers, our approach successfully diagnoses 98% of the injected anomalies at runtime in real-world HPC clusters with negligible computational overhead. We then introduce an analytics framework to address another major source of performance anomalies in cloud data centers: software misconfigurations. Our framework discovers and extracts configuration information from cloud instances such as containers or virtual machines. This is the first framework to provide comprehensive visibility into software configurations in multi-tenant cloud platforms, enabling systematic analysis for validating the correctness of software configurations. This thesis also contributes to the design of robust and efficient system management methods that leverage continuously monitored resource usage data. To improve performance under power constraints, we propose a workload- and cooling-aware power budgeting algorithm that distributes the available power among servers and cooling units in a data center, achieving up to 21% improvement in throughput per Watt compared to the state-of-the-art. Additionally, we design a network- and communication-aware HPC workload placement policy that reduces communication overhead by up to 30% in terms of hop-bytes compared to existing policies.2019-07-02T00:00:00
    • …
    corecore