Search CORE

3,566 research outputs found

EnergAt: Fine-Grained Energy Attribution for Multi-Tenancy

Author: Friedman Michal
Hè Hongyu
Rekatsinas Theodoros
Publication venue
Publication date: 20/07/2023
Field of study

In the post-Moore's Law era, relying solely on hardware advancements for automatic performance gains is no longer feasible without increased energy consumption, due to the end of Dennard scaling. Consequently, computing accounts for an increasing amount of global energy usage, contradicting the objective of sustainable computing. The lack of hardware support and the absence of a standardized, software-centric method for the precise tracing of energy provenance exacerbates the issue. Aiming to overcome this challenge, we argue that fine-grained software energy attribution is attainable, even with limited hardware support. To support our position, we present a thread-level, NUMA-aware energy attribution method for CPU and DRAM in multi-tenant environments. The evaluation of our prototype implementation, EnergAt, demonstrates the validity, effectiveness, and robustness of our theoretical model, even in the presence of the noisy-neighbor effect. We envisage a sustainable cloud environment and emphasize the importance of collective efforts to improve software energy efficiency.Comment: 8 pages, 4 figures; Published in HotCarbon 2023; Artifact available at https://github.com/HongyuHe/energa

arXiv.org e-Print Archive

FaST-GShare: Enabling Efficient Spatio-Temporal GPU Sharing in Serverless Computing for Deep Learning Inference

Author: Chadha Mohak
Gerndt Michael
Gu Jianfeng
Wang Puxuan
Zhu Yichao
Publication venue
Publication date: 01/09/2023
Field of study

Serverless computing (FaaS) has been extensively utilized for deep learning (DL) inference due to the ease of deployment and pay-per-use benefits. However, existing FaaS platforms utilize GPUs in a coarse manner for DL inferences, without taking into account spatio-temporal resource multiplexing and isolation, which results in severe GPU under-utilization, high usage expenses, and SLO (Service Level Objectives) violation. There is an imperative need to enable an efficient and SLO-aware GPU-sharing mechanism in serverless computing to facilitate cost-effective DL inferences. In this paper, we propose \textbf{FaST-GShare}, an efficient \textit{\textbf{Fa}aS-oriented \textbf{S}patio-\textbf{T}emporal \textbf{G}PU \textbf{Sharing}} architecture for deep learning inferences. In the architecture, we introduce the FaST-Manager to limit and isolate spatio-temporal resources for GPU multiplexing. In order to realize function performance, the automatic and flexible FaST-Profiler is proposed to profile function throughput under various resource allocations. Based on the profiling data and the isolation mechanism, we introduce the FaST-Scheduler with heuristic auto-scaling and efficient resource allocation to guarantee function SLOs. Meanwhile, FaST-Scheduler schedules function with efficient GPU node selection to maximize GPU usage. Furthermore, model sharing is exploited to mitigate memory contention. Our prototype implementation on the OpenFaaS platform and experiments on MLPerf-based benchmark prove that FaST-GShare can ensure resource isolation and function SLOs. Compared to the time sharing mechanism, FaST-GShare can improve throughput by 3.15x, GPU utilization by 1.34x, and SM (Streaming Multiprocessor) occupancy by 3.13x on average.Comment: The paper has been accepted by ACM ICPP 202

arXiv.org e-Print Archive