Search CORE

5 research outputs found

DeepPlace: Learning to Place Applications in Multi-Tenant Clusters

Author: Dhake Neeraj
Mitra Subrata
Mondal Shanka Subhra
Nehra Ravinder
Sheoran Nikhil
Simha Ramanuja
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Large multi-tenant production clusters often have to handle a variety of jobs and applications with a variety of complex resource usage characteristics. It is non-trivial and non-optimal to manually create placement rules for scheduling that would decide which applications should co-locate. In this paper, we present DeepPlace, a scheduler that learns to exploits various temporal resource usage patterns of applications using Deep Reinforcement Learning (Deep RL) to reduce resource competition across jobs running in the same machine while at the same time optimizing for overall cluster utilization.Comment: APSys 201

arXiv.org e-Print Archive

Crossref

Local memory-aware kernel perforation

Author: Cosenza Biagio
Juurlink Ben
Maier Daniel
Publication venue
Publication date: 01/01/2018
Field of study

Many applications provide inherent resilience to some amount of error and can potentially trade accuracy for performance by using approximate computing. Applications running on GPUs often use local memory to minimize the number of global memory accesses and to speed up execution. Local memory can also be very useful to improve the way approximate computation is performed, e.g., by improving the quality of approximation with data reconstruction techniques. This paper introduces local memory-aware perforation techniques specifically designed for the acceleration and approximation of GPU kernels. We propose a local memory-aware kernel perforation technique that first skips the loading of parts of the input data from global memory, and later uses reconstruction techniques on local memory to reach higher accuracy while having performance similar to state-of-the-art techniques. Experiments show that our approach is able to accelerate the execution of a variety of applications from 1.6× to 3× while introducing an average error of 6%, which is much smaller than that of other approaches. Results further show how much the error depends on the input data and application scenario, the impact of local memory tuning and different parameter configurations

DepositOnce

Crossref

Archivio della Ricerca - Università di Salerno

On Performance Optimization and Quality Control for Approximate-Communication-Enabled Networks-on-Chip

Author: Mak Terrence
Palesi Maurizio
Singh Amit Kumar
Wang Liang
Wang Xiaohang
Xiao Siyuan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2021
Field of study

For many applications showing error forgiveness, approximate computing is a new design paradigm that trades application output accuracy for mitigating computation/communication effort, which results in performance/energy benefit. Since networks-on-chip (NoCs) are one of the major contributors to system performance and power consumption, the underlying communication is approximated to achieve time/energy improvement. However, performing approximation blindly causes unacceptable quality loss. In this article, first, an optimization problem to maximize NoC performance is formulated with the constraint of application quality requirement, and the application quality loss is studied. Second, a congestion-aware quality control method is proposed to improve system performance by aggressively dropping network data, which is based on flow prediction and a lightweight heuristic. In the experiments, two recent approximation methods for NoCs are augmented with our proposed control method to compare with their original ones. Experimental results show that our proposed method can speed up execution by as much as 29.42% over the two state-of-the-art works

University of Essex Research Repository

Crossref