3,118 research outputs found
Resource-aware scheduling for 2D/3D multi-/many-core processor-memory systems
This dissertation addresses the complexities of 2D/3D multi-/many-core processor-memory systems, focusing on two key areas: enhancing timing predictability in real-time multi-core processors and optimizing performance within thermal constraints. The integration of an increasing number of transistors into compact chip designs, while boosting computational capacity, presents challenges in resource contention and thermal management. The first part of the thesis improves timing predictability. We enhance shared cache interference analysis for set-associative caches, advancing the calculation of Worst-Case Execution Time (WCET). This development enables accurate assessment of cache interference and the effectiveness of partitioned schedulers in real-world scenarios. We introduce TCPS, a novel task and cache-aware partitioned scheduler that optimizes cache partitioning based on task-specific WCET sensitivity, leading to improved schedulability and predictability. Our research explores various cache and scheduling configurations, providing insights into their performance trade-offs. The second part focuses on thermal management in 2D/3D many-core systems. Recognizing the limitations of Dynamic Voltage and Frequency Scaling (DVFS) in S-NUCA many-core processors, we propose synchronous thread migrations as a thermal management strategy. This approach culminates in the HotPotato scheduler, which balances performance and thermal safety. We also introduce 3D-TTP, a transient temperature-aware power budgeting strategy for 3D-stacked systems, reducing the need for Dynamic Thermal Management (DTM) activation. Finally, we present 3QUTM, a novel method for 3D-stacked systems that combines core DVFS and memory bank Low Power Modes with a learning algorithm, optimizing response times within thermal limits. This research contributes significantly to enhancing performance and thermal management in advanced processor-memory systems
UMSL Bulletin 2022-2023
The 2022-2023 Bulletin and Course Catalog for the University of Missouri St. Louis.https://irl.umsl.edu/bulletin/1087/thumbnail.jp
FaST-GShare: Enabling Efficient Spatio-Temporal GPU Sharing in Serverless Computing for Deep Learning Inference
Serverless computing (FaaS) has been extensively utilized for deep learning
(DL) inference due to the ease of deployment and pay-per-use benefits. However,
existing FaaS platforms utilize GPUs in a coarse manner for DL inferences,
without taking into account spatio-temporal resource multiplexing and
isolation, which results in severe GPU under-utilization, high usage expenses,
and SLO (Service Level Objectives) violation. There is an imperative need to
enable an efficient and SLO-aware GPU-sharing mechanism in serverless computing
to facilitate cost-effective DL inferences. In this paper, we propose
\textbf{FaST-GShare}, an efficient \textit{\textbf{Fa}aS-oriented
\textbf{S}patio-\textbf{T}emporal \textbf{G}PU \textbf{Sharing}} architecture
for deep learning inferences. In the architecture, we introduce the
FaST-Manager to limit and isolate spatio-temporal resources for GPU
multiplexing. In order to realize function performance, the automatic and
flexible FaST-Profiler is proposed to profile function throughput under various
resource allocations. Based on the profiling data and the isolation mechanism,
we introduce the FaST-Scheduler with heuristic auto-scaling and efficient
resource allocation to guarantee function SLOs. Meanwhile, FaST-Scheduler
schedules function with efficient GPU node selection to maximize GPU usage.
Furthermore, model sharing is exploited to mitigate memory contention. Our
prototype implementation on the OpenFaaS platform and experiments on
MLPerf-based benchmark prove that FaST-GShare can ensure resource isolation and
function SLOs. Compared to the time sharing mechanism, FaST-GShare can improve
throughput by 3.15x, GPU utilization by 1.34x, and SM (Streaming
Multiprocessor) occupancy by 3.13x on average.Comment: The paper has been accepted by ACM ICPP 202
Heterogeneous Acceleration for 5G New Radio Channel Modelling Using FPGAs and GPUs
L'abstract è presente nell'allegato / the abstract is in the attachmen
- …