22 research outputs found

    Developing power‐aware scheduling mechanisms for computing systems virtualized by Xen

    Get PDF
    Cloud computing emerges as one of the most important technologies for interconnecting people and building the so‐called Internet of People (IoP). In such a cloud‐based IoP, the virtualization technique provides the key supporting environments for running the IoP jobs such as performing data analysis and mining personal information. Nowadays, energy consumption in such a system is a critical metric to measure the sustainability and eco‐friendliness of the system. This paper develops three power‐aware scheduling strategies in virtualized systems managed by Xen, which is a popular virtualization technique. These three strategies are the Least performance Loss Scheduling strategy, the No performance Loss Scheduling strategy, and the Best Frequency Match scheduling strategy. These power‐aware strategies are developed by identifying the limitation of Xen in scaling the CPU frequency and aim to reduce the energy waste without sacrificing the jobs running performance in the computing systems virtualized by Xen. Least performance Loss Scheduling works by re‐arranging the execution order of the virtual machines (VMs). No performance Loss Scheduling works by setting a proper initial CPU frequency for running the VMs. Best Frequency Match reduces energy waste and performance loss by allowing the VMs to jump the queue so that the VM that is put into execution best matches the current CPU frequency. Scheduling for both single core and multicore processors is considered in this paper. The evaluation experiments have been conducted, and the results show that compared with the original scheduling strategy in Xen, the developed power‐aware scheduling algorithm is able to reduce energy consumption without reducing the performance for the jobs running in Xen

    Efficient and Scalable Computing for Resource-Constrained Cyber-Physical Systems: A Layered Approach

    Get PDF
    With the evolution of computing and communication technology, cyber-physical systems such as self-driving cars, unmanned aerial vehicles, and mobile cognitive robots are achieving increasing levels of multifunctionality and miniaturization, enabling them to execute versatile tasks in a resource-constrained environment. Therefore, the computing systems that power these resource-constrained cyber-physical systems (RCCPSs) have to achieve high efficiency and scalability. First of all, given a fixed amount of onboard energy, these computing systems should not only be power-efficient but also exhibit sufficiently high performance to gracefully handle complex algorithms for learning-based perception and AI-driven decision-making. Meanwhile, scalability requires that the current computing system and its components can be extended both horizontally, with more resources, and vertically, with emerging advanced technology. To achieve efficient and scalable computing systems in RCCPSs, my research broadly investigates a set of techniques and solutions via a bottom-up layered approach. This layered approach leverages the characteristics of each system layer (e.g., the circuit, architecture, and operating system layers) and their interactions to discover and explore the optimal system tradeoffs among performance, efficiency, and scalability. At the circuit layer, we investigate the benefits of novel power delivery and management schemes enabled by integrated voltage regulators (IVRs). Then, between the circuit and microarchitecture/architecture layers, we present a voltage-stacked power delivery system that offers best-in-class power delivery efficiency for many-core systems. After this, using Graphics Processing Units (GPUs) as a case study, we develop a real-time resource scheduling framework at the architecture and operating system layers for heterogeneous computing platforms with guaranteed task deadlines. Finally, fast dynamic voltage and frequency scaling (DVFS) based power management across the circuit, architecture, and operating system layers is studied through a learning-based hierarchical power management strategy for multi-/many-core systems

    Improving the Performance of User-level Runtime Systems for Concurrent Applications

    Get PDF
    Concurrency is an essential part of many modern large-scale software systems. Applications must handle millions of simultaneous requests from millions of connected devices. Handling such a large number of concurrent requests requires runtime systems that efficiently man- age concurrency and communication among tasks in an application across multiple cores. Existing low-level programming techniques provide scalable solutions with low overhead, but require non-linear control flow. Alternative approaches to concurrent programming, such as Erlang and Go, support linear control flow by mapping multiple user-level execution entities across multiple kernel threads (M:N threading). However, these systems provide comprehensive execution environments that make it difficult to assess the performance impact of user-level runtimes in isolation. This thesis presents a nimble M:N user-level threading runtime that closes this con- ceptual gap and provides a software infrastructure to precisely study the performance impact of user-level threading. Multiple design alternatives are presented and evaluated for scheduling, I/O multiplexing, and synchronization components of the runtime. The performance of the runtime is evaluated in comparison to event-driven software, system- level threading, and other user-level threading runtimes. An experimental evaluation is conducted using benchmark programs, as well as the popular Memcached application. The user-level runtime supports high levels of concurrency without sacrificing application performance. In addition, the user-level scheduling problem is studied in the context of an existing actor runtime that maps multiple actors to multiple kernel-level threads. In particular, two locality-aware work-stealing schedulers are proposed and evaluated. It is shown that locality-aware scheduling can significantly improve the performance of a class of applications with a high level of concurrency. In general, the performance and resource utilization of large-scale concurrent applications depends on the level of concurrency that can be expressed by the programming model. This fundamental effect is studied by refining and customizing existing concurrency models

    Design, testing and performance analisys of efficient lock-free solutions for multi-core Linux scheduler

    Get PDF
    Multiprocessor systems are nowadays de facto standard for both personal computers and server workstations. Benefits of multi-core technology has recently been used for embedded devices and cellular phones as well. Linux has not been originally designed to be a Real-Time Operating System (RTOS) but, recently, a new scheduling class, named SCHED_DEADLINE, was added to it. SCHED_DEADLINE is an implementation of the well known Earliest Deadline First algorithm. In this thesis we first present PRACTISE, a tool for developing, debugging, testing and analyse real-time scheduling data structures in user space. Unlike other similar tools, PRACTISE executes code in parallel, allowing to test and analyse the performance of the code in a realistic multiprocessor scenario. We also show an implementation of a skiplist, realized with the help of the tool above. This implementation is intended to be used for processes migration among the CPUs in SCHED_DEADLINE. To effectively manage the concurrent accesses to the data structure we used a revised version of the flat combining framework

    Performance-aware task scheduling in multi-core computers

    Get PDF
    Multi-core systems become more and more popular as they can satisfy the increasing computation capacity requirements of complex applications. Task scheduling strategy plays a key role in this vision and ensures that the task processing is both Quality-of-Service (QoS, in this thesis, refers to deadline) satisfied and energy-efficient. In this thesis, we develop task scheduling strategies for multi-core computing systems. We start by looking at two objectives of a multi-core computing system. The first objective aims at ensuring all tasks can satisfy their time constraints (i.e. deadline), while the second strives to minimize the overall energy consumption of the platform. We develop three power-aware scheduling strategies in virtualized systems managed by Xen. Comparing with the original scheduling strategy in Xen, these scheduling algorithms are able to reduce energy consumption without reducing the performance for the jobs. Then, we find that modelling the makespan of a task (before execution) accurately is very important for making scheduling decisions. Our studies show that the discrepancy between the assumption of (commonly used) sequential execution and the reality of time sharing execution may lead to inaccurate calculation of the task makespan. Thus, we investigate the impact of the time sharing execution on the task makespan, and propose the method to model and determine the makespan with the time-sharing execution. Thereafter, we extend our work to a more complex scenario: scheduling DAG applications for time sharing systems. Based on our time-sharing makespan model, we further develop the scheduling strategies for DAG jobs in time-sharing execution, which achieves more effective at task execution. Finally, as the resource interference also makes a big difference to the performance of co-running tasks in multi-core computers (which may further influence the scheduling decision making), we investigate the influential factors that impact on the performance when the tasks ii are co-running on a multicore computer and propose the machine learning-based prediction frameworks to predict the performance of the co-running tasks. The experimental results show that the techniques proposed in this thesis is effective

    Qos-aware fine-grained power management in networked computing systems

    Get PDF
    Power is a major design concern of today\u27s networked computing systems, from low-power battery-powered mobile and embedded systems to high-power enterprise servers. Embedded systems are required to be power efficiency because most embedded systems are powered by battery with limited capacity. Similar concern of power expenditure rises as well in enterprise server environments due to cooling requirement, power delivery limit, electricity costs as well as environment pollutions. The power consumption in networked computing systems includes that on circuit board and that for communication. In the context of networked real-time systems, the power dissipation on wireless communication is more significant than that on circuit board. We focus on packet scheduling for wireless real-time systems with renewable energy resources. In such a scenario, it is required to transmit data with higher level of importance periodically. We formulate this packet scheduling problem as an NP-hard reward maximization problem with time and energy constraints. An optimal solution with pseudo polynomial time complexity is presented. In addition, we propose a sub-optimal solution with polynomial time complexity. Circuit board, especially processor, power consumption is still the major source of system power consumption. We provide a general-purposed, practical and comprehensive power management middleware for networked computing systems to manage circuit board power consumption thus to affect system-level power consumption. It has the functionalities of power and performance monitoring, power management (PM) policy selection and PM control, as well as energy efficiency analysis. This middleware includes an extensible PM policy library. We implemented a prototype of this middleware on Base Band Units (BBUs) with three PM policies enclosed. These policies have been validated on different platforms, such as enterprise servers, virtual environments and BBUs. In enterprise environments, the power dissipation on circuit board dominates. Regulation on computing resources on board has a significant impact on power consumption. Dynamic Voltage and Frequency Scaling (DVFS) is an effective technique to conserve energy consumption. We investigate system-level power management in order to avoid system failures due to power capacity overload or overheating. This management needs to control the power consumption in an accurate and responsive manner, which cannot be achieve by the existing black-box feedback control. Thus we present a model-predictive feedback controller to regulate processor frequency so that power budget can be satisfied without significant loss on performance. In addition to providing power guarantee alone, performance with respect to service-level agreements (SLAs) is required to be guaranteed as well. The proliferation of virtualization technology imposes new challenges on power management due to resource sharing. It is hard to achieve optimization in both power and performance on shared infrastructures due to system dynamics. We propose vPnP, a feedback control based coordination approach providing guarantee on application-level performance and underlying physical host power consumption in virtualized environments. This system can adapt gracefully to workload change. The preliminary results show its flexibility to achieve different levels of tradeoffs between power and performance as well as its robustness over a variety of workloads. It is desirable for improve energy efficiency of systems, such as BBUs, hosting soft-real time applications. We proposed a power management strategy for controlling delay and minimizing power consumption using DVFS. We use the Robbins-Monro (RM) stochastic approximation method to estimate delay quantile. We couple a fuzzy controller with the RM algorithm to scale CPU frequency that will maintain performance within the specified QoS

    Trustworthiness in Mobile Cyber Physical Systems

    Get PDF
    Computing and communication capabilities are increasingly embedded in diverse objects and structures in the physical environment. They will link the ‘cyberworld’ of computing and communications with the physical world. These applications are called cyber physical systems (CPS). Obviously, the increased involvement of real-world entities leads to a greater demand for trustworthy systems. Hence, we use "system trustworthiness" here, which can guarantee continuous service in the presence of internal errors or external attacks. Mobile CPS (MCPS) is a prominent subcategory of CPS in which the physical component has no permanent location. Mobile Internet devices already provide ubiquitous platforms for building novel MCPS applications. The objective of this Special Issue is to contribute to research in modern/future trustworthy MCPS, including design, modeling, simulation, dependability, and so on. It is imperative to address the issues which are critical to their mobility, report significant advances in the underlying science, and discuss the challenges of development and implementation in various applications of MCPS

    ZuverlÀssige und Energieeffiziente gemischt-kritische Echtzeit On-Chip Systeme

    Get PDF
    Multi- and many-core embedded systems are increasingly becoming the target for many applications that require high performance under varying conditions. A resulting challenge is the control, and reliable operation of such complex multiprocessing architectures under changes, e.g., high temperature and degradation. In mixed-criticality systems where many applications with varying criticalities are consolidated on the same execution platform, fundamental isolation requirements to guarantee non-interference of critical functions are crucially important. While Networks-on-Chip (NoCs) are the prevalent solution to provide scalable and efficient interconnects for the multiprocessing architectures, their associated energy consumption has immensely increased. Specifically, hard real-time NoCs must manifest limited energy consumption as thermal runaway in such a core shared resource jeopardizes the whole system guarantees. Thus, dynamic energy management of NoCs, as opposed to the related work static solutions, is highly necessary to save energy and decrease temperature, while preserving essential temporal requirements. In this thesis, we introduce a centralized management to provide energy-aware NoCs for hard real-time systems. The design relies on an energy control network, developed on top of an existing switch arbitration network to allow isolation between energy optimization and data transmission. The energy control layer includes local units called Power-Aware NoC controllers that dynamically optimize NoC energy depending on the global state and applications’ temporal requirements. Furthermore, to adapt to abnormal situations that might occur in the system due to degradation, we extend the concept of NoC energy control to include the entire system scope. That is, online resource management employing hierarchical control layers to treat system degradation (imminent core failures) is supported. The mechanism applies system reconfiguration that involves workload migration. For mixed-criticality systems, it allows flexible boundaries between safety-critical and non-critical subsystems to safely apply the reconfiguration, preserving fundamental safety requirements and temporal predictability. Simulation and formal analysis-based experiments on various realistic usecases and benchmarks are conducted showing significant improvements in NoC energy-savings and in treatment of system degradation for mixed-criticality systems improving dependability over the status quo.Eingebettete Many- und Multi-core-Systeme werden zunehmend das Ziel fĂŒr Anwendungen, die hohe Anfordungen unter unterschiedlichen Bedinungen haben. FĂŒr solche hochkomplexed Multi-Prozessor-Systeme ist es eine grosse Herausforderung zuverlĂ€ssigen Betrieb sicherzustellen, insbesondere wenn sich die UmgebungseinflĂŒsse verĂ€ndern. In Systeme mit gemischter KritikalitĂ€t, in denen viele Anwendungen mit unterschiedlicher KritikalitĂ€t auf derselben AusfĂŒhrungsplattform bedient werden mĂŒssen, sind grundlegende Isolationsanforderungen zur GewĂ€hrleistung der Nichteinmischung kritischer Funktionen von entscheidender Bedeutung. WĂ€hrend On-Chip Netzwerke (NoCs) hĂ€ufig als skalierbare Verbindung fĂŒr die Multiprozessor-Architekturen eingesetzt werden, ist der damit verbundene Energieverbrauch immens gestiegen. Daher sind dynamische Plattformverwaltungen, im Gegensatz zu den statischen, zwingend notwendig, um ein System an die oben genannten VerĂ€nderungen anzupassen und gleichzeitig Timing zu gewĂ€hrleisten. In dieser Arbeit entwickeln wir energieeffiziente NoCs fĂŒr harte Echtzeitsysteme. Das Design basiert auf einem Energiekontrollnetzwerk, das auf einem bestehenden Switch-Arbitration-Netzwerk entwickelt wurde, um eine Isolierung zwischen Energieoptimierung und DatenĂŒbertragung zu ermöglichen. Die Energiesteuerungsschicht umfasst lokale Einheiten, die als Power-Aware NoC-Controllers bezeichnet werden und die die NoC-Energie in AbhĂ€ngigkeit vom globalen Zustand und den zeitlichen Anforderungen der Anwendungen optimieren. DarĂŒber hinaus wird das Konzept der NoC-Energiekontrolle zur Anpassung an Anomalien, die aufgrund von Abnutzung auftreten können, auf den gesamten Systemumfang ausgedehnt. Online- Ressourcenverwaltungen, die hierarchische Kontrollschichten zur Behandlung Abnutzung (drohender KernausfĂ€lle) einsetzen, werden bereitgestellt. Bei Systemen mit gemischter KritikalitĂ€t erlaubt es flexible Grenzen zwischen sicherheitskritischen und unkritischen Subsystemen, um die Rekonfiguration sicher anzuwenden, wobei grundlegende Sicherheitsanforderungen erhalten bleiben und Timing Vorhersehbarkeit. Experimente werden auf der Basis von Simulationen und formalen Analysen zu verschiedenen realistischen Anwendungsfallen und Benchmarks durchgefĂŒhrt, die signifikanten Verbesserungen bei On-Chip Netzwerke-Energieeinsparungen und bei der Behandlung von Abnutzung fĂŒr Systeme mit gemischter KritikalitĂ€t zur Verbesserung die SystemstabilitĂ€t gegenĂŒber dem bisherigen Status quo zeigen

    Real-time Control and Optimization of Water Supply and Distribution infrastructure

    Get PDF
    Across North America, water supply and distribution systems (WSDs) are controlled manually by operational staff - who place a heavy reliance on their experience and judgement when rendering operational decisions. These decisions range from scheduling the operation of pumps, valves and chemical dosing in the system. However, due to the uncertainty of demand, stringent water quality regulatory constraints, external forcing (cold/drought climates, fires, bursts) from the environment, and the non-stationarity of climate change, operators have the tendency to control their systems conservatively and reactively. WSDs that are operated in such fashion are said to be 'reactive' because: (i) the operators manually react to changes in the system behaviour, as measured by Supervisory Control and Data Acquisition (SCADA) systems; and (ii) are not always aware of any anomalies in the system until they are reported by consumers and authorities. The net result is that the overall operations of WSDs are suboptimal with respect to energy consumption, water losses, infrastructure damage and water quality. In this research, an intelligent platform, namely the Real-time Dynamically Dimensioned Scheduler (RT-DDS), is developed and quantitatively assessed for the proactive control and optimization of WSD operations. The RT-DDS platform was configured to solve a dynamic control problem at every timestep (hour) of the day. The control problem involved the minimization of energy costs (over the 24-hour period) by recommending 'near-optimal' pump schedules, while satisfying hydraulic reliability constraints. These constraints were predefined by operational staff and regulatory limits and define a tolerance band for pressure and storage levels across the WSD system. The RT-DDS platform includes three essential modules. The first module produces high-resolution forecasts of water demand via ensemble machine learning techniques. A water demand profile for the next 24-hours is predicted based on historical demand, ambient conditions (i.e. temperature, precipitation) and current calendar information. The predicted profile is then fed into the second module, which involves a simulation model of the WSD. The model is used to determine the hydraulic impacts of particular control settings. The results of the simulation model are used to guide the search strategy of the final module - a stochastic single solution optimization algorithm. The optimizer is parallelized for computational efficiency, such that the reporting frequency of the platform is within 15 minutes of execution time. The fidelity of the prediction engine of the RT-DDS platform was evaluated with an Advanced Metering Infrastructure (AMI) driven case study, whereby the short-term water consumption of the residential units in the city were predicted. A Multi-Layer Perceptron (MLP) model alongside ensemble-driven learning techniques (Random forests, Bagging trees and Boosted trees) were built, trained and validated as part of this research. A three-stage validation process was adopted to assess the replicative, predictive and structural validity of the models. Further, the models were assessed in their predictive capacity at two different spatial resolutions: at a single meter and at the city-level. While the models proved to have strong generalization capability, via good performance in the cross-validation testing, the models displayed slight biases when aiming to predict extreme peak events in the single meter dataset. It was concluded that the models performed far better with a lower spatial resolution (at the city or district level) whereby peak events are far more normalized. In general, the models demonstrated the capacity of using machine learning techniques in the context of short term water demand forecasting - particularly for real-time control and optimization. In determining the optimal representation of pump schedules for real-time optimization, multiple control variable formulations were assessed. These included binary control statuses and time-controlled triggers, whereby the pump schedule was represented as a sequence of on/off binary variables and active/idle discrete time periods, respectively. While the time controlled trigger representation systematically outperformed the binary representation in terms of computational efficiency, it was found that both formulations led to conditions whereby the system would violate the predefined maximum number of pump switches per calendar day. This occurred because at each timestep the control variable formulation was unaware of the previously elapsed pump switches in the subsequent hours. Violations in the maximum pump switch limits lead to transient instabilities and thus create hydraulically undesirable conditions. As such, a novel feedback architecture was proposed, such that at every timestep, the number of switches that had elapsed in the previous hours was explicitly encoded into the formulation. In this manner, the maximum number of switches per calendar day was never violated since the optimizer was aware of the current trajectory of the system. Using this novel formulation, daily energy cost savings of up to 25\% were achievable on an average day, leading to cost savings of over 2.3 million dollars over a ten-year period. Moreover, stable hydraulic conditions were produced in the system, thereby changing very little when compared to baseline operations in terms of quality of service and overall condition of assets

    Spectral analysis of executions of computer programs and its applications on performance analysis

    Get PDF
    This work is motivated by the growing intricacy of high performance computing infrastructures. For example, supercomputer MareNostrum (installed in 2005 at BSC) has 10240 processors and currently there are machines with more than 100.000 processors. The complexity of this systems increases the complexity of the manual performance analysis of parallel applications. For this reason, it is mandatory to use automatic tools and methodologies.The performance analysis group of BSC and UPC has a large experience in analyzing parallel applications. The approach of this group consists mainly in the analysis of tracefiles (obtained from parallel applications executions) using performance analysis and visualization tools, such as Paraver. Taking into account the general characteristics of the current systems, this method can sometimes be very expensive in terms of time and inefficient. To overcome these problems, this thesis makes several contributions.The first one is an automatic system able to detect the internal structure of executions of high performance computing applications. This automatic system is able to rule out nonsignificant regions of executions, to detect redundancies and, finally, to select small but significant execution regions. This automatic detection process is based on spectral analysis (wavelet transform, fourier transform, etc..) and works detecting the most important frequencies of the application's execution. These main frequencies are strongly related to the internal loops of the application' source code. Finally, it is important to state that an automatic detection of small but significant execution regions reduces remarkably the complexity of the performance analysis process.The second contribution is an automatic methodology able to show general but nontrivial performance trends. They can be very useful for the analyst in order to carry out a performance analysis of the application. The automatic methodology is based on an analytical model. This model consists in several performance factors. Such factors modify the value of the linear speedup in order to fit the real speedup. That is, if this real speedup is far from the linear one, we will detect immediately which one of the performance factors is undermining the scalability of the application. The second main characteristic of the analytical model is that it can be used to predict the performance of high performance computing applications. From several execution on a few of processors, we extract model's performance factors and we extrapolate these values to executions on higher number of processors. Finally, we obtain a speedup prediction using the analytical model.The third contribution is the automatic detection of the optimal sampling frequency of applications. We show that it is possible to extract this frequency using spectral analysis. In case of sequential applications, we show that to use this frequency improves existing results of recognized techniques focused on the reduction of serial application's instruction execution stream (SimPoint, Smarts, etc..). In case of parallel benchmarks, we show that the optimal frequency is very useful to extract significant performance information very efficiently and accurately.In summary, this thesis proposes a set of techniques based on signal processing. The main focus of these techniques is to perform an automatic analysis of the applications, reporting and initial diagnostic of their performance and showing their internal iterative structure. Finally, these methods also provide a reduced tracefile from which it is easy to start manual finegrain performance analysis. The contributions of the thesis are not reduced to proposals and publications. The research carried out these last years has provided a tool for analyzing applications' structure. Even more, the methodology is general and it can be adapted to many performance analysis methods, improving remarkably their efficiency, flexibility and generality