Search CORE

361 research outputs found

A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems

Author: Mittal Sparsh
Publication venue
Publication date: 01/01/2014
Field of study

Recent technological advances have greatly improved the performance and features of embedded systems. With the number of just mobile devices now reaching nearly equal to the population of earth, embedded systems have truly become ubiquitous. These trends, however, have also made the task of managing their power consumption extremely challenging. In recent years, several techniques have been proposed to address this issue. In this paper, we survey the techniques for managing power consumption of embedded systems. We discuss the need of power management and provide a classification of the techniques on several important parameters to highlight their similarities and differences. This paper is intended to help the researchers and application-developers in gaining insights into the working of power management techniques and designing even more efficient high-performance embedded systems of tomorrow

arXiv.org e-Print Archive

Crossref

Scheduling in Mapreduce Clusters

Author: He Chen
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/01/2018
Field of study

MapReduce is a framework proposed by Google for processing huge amounts of data in a distributed environment. The simplicity of the programming model and the fault-tolerance feature of the framework make it very popular in Big Data processing. As MapReduce clusters get popular, their scheduling becomes increasingly important. On one hand, many MapReduce applications have high performance requirements, for example, on response time and/or throughput. On the other hand, with the increasing size of MapReduce clusters, the energy-efficient scheduling of MapReduce clusters becomes inevitable. These scheduling challenges, however, have not been systematically studied. The objective of this dissertation is to provide MapReduce applications with low cost and energy consumption through the development of scheduling theory and algorithms, energy models, and energy-aware resource management. In particular, we will investigate energy-efficient scheduling in hybrid CPU-GPU MapReduce clusters. This research work is expected to have a breakthrough in Big Data processing, particularly in providing green computing to Big Data applications such as social network analysis, medical care data mining, and financial fraud detection. The tools we propose to develop are expected to increase utilization and reduce energy consumption for MapReduce clusters. In this PhD dissertation, we propose to address the aforementioned challenges by investigating and developing 1) a match-making scheduling algorithm for improving the data locality of Map- Reduce applications, 2) a real-time scheduling algorithm for heterogeneous Map- Reduce clusters, and 3) an energy-efficient scheduler for hybrid CPU-GPU Map- Reduce cluster. Advisers: Ying Lu and David Swanso

DigitalCommons@University of Nebraska

Techniques to Improve Energy Efficiency on Heterogeneous Multiprocessors under Timing and Quality Constraints

Author: Azhar Muhammad Waqar
Publication venue
Publication date: 01/01/2022
Field of study

Traditionally, applications are executed without the notion of a computational deadline and often use all available system resources, which leads to higher\ua0energy consumption. User specification of Quality of Service (QoS) constraints,\ua0in terms of completion time and solution quality, opens up for allocation of\ua0just enough resources to an application to finish just in time and thereby save\ua0energy. Modern heterogeneous multiprocessor (HMP) platforms provide a\ua0set of configurable resources, including a frequency range of dynamic voltage\ua0frequency scaling (DVFS), one among a set processor types, and one or a\ua0plurality of processors of each type. They can be configured at run-time to\ua0open up new opportunities for resource management.This thesis presents techniques to reduce energy consumption under QoS\ua0constraints by allocating resources at run-time on heterogeneous multiprocessor platforms targeting sequential and parallel iterative and task-parallel\ua0applications. The proposed techniques rely on a progress-tracking framework\ua0that monitors and predicts how much time is left until the application finishes.\ua0Furthermore, the proposed framework enables the prediction of computation\ua0demand and performance requirements for future iterations or tasks.\ua0The first contribution of this thesis is a resource management technique,\ua0called SLOOP, targeting single-threaded applications. SLOOP allocates resources, i.e., processor type and DVFS, for each iteration to meet deadlines\ua0while using the prediction of computational demand and execution time.The second contribution of this thesis is a resource-management scheme, called SaC, for multi-threaded applications executing on HMPs, where resources\ua0also include the number of processors besides DVFS and processor type. SaC\ua0first chooses the most energy-efficient configuration that meets the deadline.\ua0The proposed technique collects execution-time slack over subsequent iterations\ua0to select a configuration that can save energy.The third contribution of this thesis is a resource manager, called Task-RM, for task-parallel applications executing on HMPs under QoS constraints. Task-RM exploits the variance in task execution times and imbalance between\ua0sibling tasks to allocate just enough resources in terms of DVFS and processor type. It uses an innovative off-line analysis to avoid redoing scheduling analysis\ua0at run-time.Finally, the fourth contribution is a scheme, called Approx-RM, that can exploit accuracy-energy trade-offs in approximate iterative applications. Approx-RM allocates an appropriate amount of resources while guaranteeing timing\ua0and solution quality specifications. Approx-RM first predicts the iteration count required to meet the quality target and then allocates enough resources\ua0on an HMP in terms of DVFS, processor type, and processor count to save\ua0energy while meeting a performance target

Chalmers Research

Heterogeneous Multi-core Architectures for High Performance Computing

Author: Chiesi Matteo <1984>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 28/04/2014
Field of study

This thesis deals with heterogeneous architectures in standard workstations. Heterogeneous architectures represent an appealing alternative to traditional supercomputers because they are based on commodity components fabricated in large quantities. Hence their price-performance ratio is unparalleled in the world of high performance computing (HPC). In particular, different aspects related to the performance and consumption of heterogeneous architectures have been explored. The thesis initially focuses on an efficient implementation of a parallel application, where the execution time is dominated by an high number of floating point instructions. Then the thesis touches the central problem of efficient management of power peaks in heterogeneous computing systems. Finally it discusses a memory-bounded problem, where the execution time is dominated by the memory latency. Specifically, the following main contributions have been carried out: A novel framework for the design and analysis of solar field for Central Receiver Systems (CRS) has been developed. The implementation based on desktop workstation equipped with multiple Graphics Processing Units (GPUs) is motivated by the need to have an accurate and fast simulation environment for studying mirror imperfection and non-planar geometries. Secondly, a power-aware scheduling algorithm on heterogeneous CPU-GPU architectures, based on an efficient distribution of the computing workload to the resources, has been realized. The scheduler manages the resources of several computing nodes with a view to reducing the peak power. The two main contributions of this work follow: the approach reduces the supply cost due to high peak power whilst having negligible impact on the parallelism of computational nodes. from another point of view the developed model allows designer to increase the number of cores without increasing the capacity of the power supply unit. Finally, an implementation for efficient graph exploration on reconfigurable architectures is presented. The purpose is to accelerate graph exploration, reducing the number of random memory accesses

AMS Tesi di Dottorato

Efficient and Scalable Computing for Resource-Constrained Cyber-Physical Systems: A Layered Approach

Author: Zou An
Publication venue: Washington University Open Scholarship
Publication date: 15/05/2021
Field of study

With the evolution of computing and communication technology, cyber-physical systems such as self-driving cars, unmanned aerial vehicles, and mobile cognitive robots are achieving increasing levels of multifunctionality and miniaturization, enabling them to execute versatile tasks in a resource-constrained environment. Therefore, the computing systems that power these resource-constrained cyber-physical systems (RCCPSs) have to achieve high efficiency and scalability. First of all, given a fixed amount of onboard energy, these computing systems should not only be power-efficient but also exhibit sufficiently high performance to gracefully handle complex algorithms for learning-based perception and AI-driven decision-making. Meanwhile, scalability requires that the current computing system and its components can be extended both horizontally, with more resources, and vertically, with emerging advanced technology. To achieve efficient and scalable computing systems in RCCPSs, my research broadly investigates a set of techniques and solutions via a bottom-up layered approach. This layered approach leverages the characteristics of each system layer (e.g., the circuit, architecture, and operating system layers) and their interactions to discover and explore the optimal system tradeoffs among performance, efficiency, and scalability. At the circuit layer, we investigate the benefits of novel power delivery and management schemes enabled by integrated voltage regulators (IVRs). Then, between the circuit and microarchitecture/architecture layers, we present a voltage-stacked power delivery system that offers best-in-class power delivery efficiency for many-core systems. After this, using Graphics Processing Units (GPUs) as a case study, we develop a real-time resource scheduling framework at the architecture and operating system layers for heterogeneous computing platforms with guaranteed task deadlines. Finally, fast dynamic voltage and frequency scaling (DVFS) based power management across the circuit, architecture, and operating system layers is studied through a learning-based hierarchical power management strategy for multi-/many-core systems

Washington University St. Louis: Open Scholarship

A Path Relinking Method for the Joint Online Scheduling and Capacity Allocation of DL Training Workloads in GPU as a Service Systems

Author: Arezoo Jahani
Danilo Ardagna
Edoardo Amaldi
Federica Filippini
Marco Lattuada
Michele Ciavotta
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

The Deep Learning (DL) paradigm gained remarkable popularity in recent years. DL models are used to tackle increasingly complex problems, making the training process require considerable computational power. The parallel computing capabilities offered by modern GPUs partially fulfill this need, but the high costs related to GPU as a Service solutions in the cloud call for efficient capacity planning and job scheduling algorithms to reduce operational costs via resource sharing. In this work, we jointly address the online capacity planning and job scheduling problems from the perspective of cloud end-users. We present a Mixed Integer Linear Programming (MILP) formulation, and a path relinking-based method aiming at optimizing operational costs by (i) rightsizing Virtual Machine (VM) capacity at each node, (ii) partitioning the set of GPUs among multiple concurrent jobs on the same VM, and (iii) determining a due-date-aware job schedule. An extensive experimental campaign attests the effectiveness of the proposed approach in practical scenarios: costs savings up to 97% are attained compared with first-principle methods based on, e.g., Earliest Deadline First, cost reductions up to 20% are obtained with respect to a previously proposed Hierarchical Method and up to 95% against a dynamic programming-based method from the literature. Scalability analyses show that systems with up to 100 nodes and 450 concurrent jobs can be managed in less than 7 seconds. The validation in a prototype cloud environment shows a deviation below 5% between real and predicted costs

Archivio istituzionale della ricerca - Politecnico di Milano