Search CORE

2,166 research outputs found

Next Generation Cloud Computing: New Trends and Research Directions

Author: Buyya Rajkumar
Varghese Blesson
Publication venue
Publication date: 08/09/2017
Field of study

The landscape of cloud computing has significantly changed over the last decade. Not only have more providers and service offerings crowded the space, but also cloud infrastructure that was traditionally limited to single provider data centers is now evolving. In this paper, we firstly discuss the changing cloud infrastructure and consider the use of infrastructure from multiple providers and the benefit of decentralising computing away from data centers. These trends have resulted in the need for a variety of new computing architectures that will be offered by future cloud infrastructure. These architectures are anticipated to impact areas, such as connecting people and devices, data-intensive computing, the service space and self-learning systems. Finally, we lay out a roadmap of challenges that will need to be addressed for realising the potential of next generation cloud systems.Comment: Accepted to Future Generation Computer Systems, 07 September 201

arXiv.org e-Print Archive

Queen's University Belfast Research Portal

Crossref

Techniques to Improve Energy Efficiency on Heterogeneous Multiprocessors under Timing and Quality Constraints

Author: Azhar Muhammad Waqar
Publication venue
Publication date: 01/01/2022
Field of study

Traditionally, applications are executed without the notion of a computational deadline and often use all available system resources, which leads to higher\ua0energy consumption. User specification of Quality of Service (QoS) constraints,\ua0in terms of completion time and solution quality, opens up for allocation of\ua0just enough resources to an application to finish just in time and thereby save\ua0energy. Modern heterogeneous multiprocessor (HMP) platforms provide a\ua0set of configurable resources, including a frequency range of dynamic voltage\ua0frequency scaling (DVFS), one among a set processor types, and one or a\ua0plurality of processors of each type. They can be configured at run-time to\ua0open up new opportunities for resource management.This thesis presents techniques to reduce energy consumption under QoS\ua0constraints by allocating resources at run-time on heterogeneous multiprocessor platforms targeting sequential and parallel iterative and task-parallel\ua0applications. The proposed techniques rely on a progress-tracking framework\ua0that monitors and predicts how much time is left until the application finishes.\ua0Furthermore, the proposed framework enables the prediction of computation\ua0demand and performance requirements for future iterations or tasks.\ua0The first contribution of this thesis is a resource management technique,\ua0called SLOOP, targeting single-threaded applications. SLOOP allocates resources, i.e., processor type and DVFS, for each iteration to meet deadlines\ua0while using the prediction of computational demand and execution time.The second contribution of this thesis is a resource-management scheme, called SaC, for multi-threaded applications executing on HMPs, where resources\ua0also include the number of processors besides DVFS and processor type. SaC\ua0first chooses the most energy-efficient configuration that meets the deadline.\ua0The proposed technique collects execution-time slack over subsequent iterations\ua0to select a configuration that can save energy.The third contribution of this thesis is a resource manager, called Task-RM, for task-parallel applications executing on HMPs under QoS constraints. Task-RM exploits the variance in task execution times and imbalance between\ua0sibling tasks to allocate just enough resources in terms of DVFS and processor type. It uses an innovative off-line analysis to avoid redoing scheduling analysis\ua0at run-time.Finally, the fourth contribution is a scheme, called Approx-RM, that can exploit accuracy-energy trade-offs in approximate iterative applications. Approx-RM allocates an appropriate amount of resources while guaranteeing timing\ua0and solution quality specifications. Approx-RM first predicts the iteration count required to meet the quality target and then allocates enough resources\ua0on an HMP in terms of DVFS, processor type, and processor count to save\ua0energy while meeting a performance target

Chalmers Research

Xar-Trek: Run-Time Execution Migration among FPGAs and Heterogeneous-ISA CPUs

Author: Barbalace Antonio
Chuang Ho-Ren
Horta Edson
Olivier Pierre
Philippidis Cesar
Ravindran Binoy
VSathish Naarayanan Rao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/10/2021
Field of study

Datacenter servers are increasingly heterogeneous: from x86 host CPUs, to ARM or RISC-V CPUs in NICs/SSDs, to FPGAs. Previous works have demonstrated that migrating application execution at run-time across heterogeneous-ISA CPUs can yield significant performance and energy gains, with relatively little programmer effort. However, FPGAs have often been overlooked in that context: hardware acceleration using FPGAs involves statically implementing select application functions, which prohibits dynamic and transparent migration. We present Xar-Trek, a new compiler and run-time software framework that overcomes this limitation. Xar-Trek compiles an application for several CPU ISAs and select application functions for acceleration on an FPGA, allowing execution migration between heterogeneous-ISA CPUs and FPGAs at run-time. Xar-Trek's run-time monitors server workloads and migrates application functions to an FPGA or to heterogeneous-ISA CPUs based on a scheduling policy. We develop a heuristic policy that uses application workload profiles to make scheduling decisions. Our evaluations conducted on a system with x86-64 server CPUs, ARM64 server CPUs, and an Alveo accelerator card reveal 88%-1% performance gains over no-migration baselines

arXiv.org e-Print Archive

Edinburgh Research Explorer

Exceeding Conservative Limits: A Consolidated Analysis on Modern Hardware Margins

Author: Chatzidimitriou Athanasios
Gizopoulos Dimitris
Kestelman Adrian Cristal
Leng Jingwen
Papadimitriou George
Reddi Vijay Janapa
Salami Behzad
Unsal Osman S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2020
Field of study

Modern large-scale computing systems (data centers, supercomputers, cloud and edge setups and high-end cyber-physical systems) employ heterogeneous architectures that consist of multicore CPUs, general-purpose many-core GPUs, and programmable FPGAs. The effective utilization of these architectures poses several challenges, among which a primary one is power consumption. Voltage reduction is one of the most efficient methods to reduce power consumption of a chip. With the galloping adoption of hardware accelerators (i.e., GPUs and FPGAs) in large datacenters and other large-scale computing infrastructures, a comprehensive evaluation of the safe voltage reduction levels for each different chip can be employed for efficient reduction of the total power. We present a survey of recent studies in voltage margins reduction at the system level for modern CPUs, GPUs and FPGAs. The pessimistic voltage guardbands inserted by the silicon vendors can be exploited in all devices for significant power savings. On average, voltage reduction can reach 12% in multicore CPUs, 20% in manycore GPUs and 39% in FPGAs.Comment: Accepted for publication in IEEE Transactions on Device and Materials Reliabilit

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

Proactive Aging Mitigation in CGRAs through Utilization-Aware Allocation

Author: Beck Antonio Carlos Schneider
Brandalero Marcelo
Hübner Michael
Lignati Bernardo Neuhaus
Shafique Muhammad
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/11/2020
Field of study

Resource balancing has been effectively used to mitigate the long-term aging effects of Negative Bias Temperature Instability (NBTI) in multi-core and Graphics Processing Unit (GPU) architectures. In this work, we investigate this strategy in Coarse-Grained Reconfigurable Arrays (CGRAs) with a novel application-to-CGRA allocation approach. By introducing important extensions to the reconfiguration logic and the datapath, we enable the dynamic movement of configurations throughout the fabric and allow overutilized Functional Units (FUs) to recover from stress-induced NBTI aging. Implementing the approach in a resource-constrained state-of-the-art CGRA reveals

2.2\times

lifetime improvement with negligible performance overheads and less than

10\%

increase in area.Comment: Please cite this as: M. Brandalero, B. N. Lignati, A. Carlos Schneider Beck, M. Shafique and M. H\"ubner, "Proactive Aging Mitigation in CGRAs through Utilization-Aware Allocation," 2020 57th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 2020, pp. 1-6, doi: 10.1109/DAC18072.2020.921858

arXiv.org e-Print Archive

Crossref

Recommended from our members

Achieving Accurate Predictions of Future Events Under Hardware Heterogeneity

Author: Prodromou Andreas
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Heterogeneous hardware is becoming increasingly available in modern hardware, while research breakthroughs enforce the expectation that heterogeneity will keep increasing in the future. Significant gains can be achieved via appropriate utilization of heterogeneity, in terms of performance and power consumption, however, poor utilization can have a detrimental effect. Intelligent scheduling and resource management is a crucial challenge we need to overcome in order to harvest the full potential of heterogeneous hardware. As systems become larger and include greater levels of hardware diversity, the importance of intelligent scheduling and resource management is further accentuated.This dissertation presents techniques that aid the process of scheduling and resource management in the presence of heterogeneous hardware, via accurately predicting upcoming runtime events. With a proactive and accurate view of the near future, schedulers can utilize the underlying hardware more efficiently, and fully take advantage of the available benefits.By adapting a majority element heuristic, this dissertation significantly improves the accuracy of predicting memory addresses about to be accessed, while reducing prediction-related costs by a factor of ten thousand compared to previously proposed predictive approaches. Coupled with novel microarchitectural modifications, accurate address predictions are shown to improve the performance of heterogeneous memory architectures.Machine learning-based performance predictors are further presented, capable of predicting a program's performance when executed on a given general-purpose core. Trained to model the subtleties of the interaction between hardware and software, these predictors are capable of generating highly accurate predictions even for cores with varied Instruction Set Architectures. Utilizing these performance predictions for job scheduling, is shown to improve overall system performance. The trained predictors are further examined and interpreted in order to visualize the correlations between features picked up and amplified during training.Finally, this dissertation demonstrates that scheduling algorithms cannot guarantee deriving an optimal schedule during realistic execution scenarios due to the underlying hardware heterogeneity, the wide range of runtime requirements of software, as well as prediction error from performance predictors. In response, deep neural networks are trained to select one scheduling approach from a list of options with varied overheads and correctness guarantees. The scheduling approach chosen, is the one which will most likely return the highest-performance schedule with the lowest overhead, given a particular instance of the job-to-core assignment problem

eScholarship - University of California