10,823 research outputs found
Analysis of Dynamic Memory Bandwidth Regulation in Multi-core Real-Time Systems
One of the primary sources of unpredictability in modern multi-core embedded
systems is contention over shared memory resources, such as caches,
interconnects, and DRAM. Despite significant achievements in the design and
analysis of multi-core systems, there is a need for a theoretical framework
that can be used to reason on the worst-case behavior of real-time workload
when both processors and memory resources are subject to scheduling decisions.
In this paper, we focus our attention on dynamic allocation of main memory
bandwidth. In particular, we study how to determine the worst-case response
time of tasks spanning through a sequence of time intervals, each with a
different bandwidth-to-core assignment. We show that the response time
computation can be reduced to a maximization problem over assignment of memory
requests to different time intervals, and we provide an efficient way to solve
such problem. As a case study, we then demonstrate how our proposed analysis
can be used to improve the schedulability of Integrated Modular Avionics
systems in the presence of memory-intensive workload.Comment: Accepted for publication in the IEEE Real-Time Systems Symposium
(RTSS) 2018 conferenc
Topology-aware GPU scheduling for learning workloads in cloud environments
Recent advances in hardware, such as systems with multiple GPUs and their availability in the cloud, are enabling deep learning in various domains including health care, autonomous vehicles, and Internet of Things. Multi-GPU systems exhibit complex connectivity among GPUs and between GPUs and CPUs. Workload schedulers must consider hardware topology and workload communication requirements in order to allocate CPU and GPU resources for optimal execution time and improved utilization in shared cloud environments.
This paper presents a new topology-aware workload placement strategy to schedule deep learning jobs on multi-GPU systems. The placement strategy is evaluated with a prototype on a Power8 machine with Tesla P100 cards, showing speedups of up to ≈1.30x compared to state-of-the-art strategies; the proposed algorithm achieves this result by allocating GPUs that satisfy workload requirements while preventing interference. Additionally, a large-scale simulation shows that the proposed strategy provides higher resource utilization and performance in cloud systems.This project is supported by the IBM/BSC Technology Center for Supercomputing
collaboration agreement. It has also received funding from the European Research Council (ERC) under the European Union’s Horizon
2020 research and innovation programme (grant agreement No 639595). It is
also partially supported by the Ministry of Economy of Spain under contract
TIN2015-65316-P and Generalitat de Catalunya under contract 2014SGR1051,
by the ICREA Academia program, and by the BSC-CNS Severo Ochoa program
(SEV-2015-0493). We thank our IBM Research colleagues Alaa Youssef
and Asser Tantawi for the valuable discussions. We also thank SC17 committee
member Blair Bethwaite of Monash University for his constructive feedback on the earlier drafts of this paper.Peer ReviewedPostprint (published version
- …