Search CORE

19 research outputs found

Asymptotically Optimal Approximation Algorithms for Coflow Scheduling

Author: Ahuja R. K.
Al-Fares M.
Al-Fares Mohammad
Peis B.
Zaharia M.
Zhao Y.
Publication venue
Publication date: 08/03/2018
Field of study

Many modern datacenter applications involve large-scale computations composed of multiple data flows that need to be completed over a shared set of distributed resources. Such a computation completes when all of its flows complete. A useful abstraction for modeling such scenarios is a {\em coflow}, which is a collection of flows (e.g., tasks, packets, data transmissions) that all share the same performance goal. In this paper, we present the first approximation algorithms for scheduling coflows over general network topologies with the objective of minimizing total weighted completion time. We consider two different models for coflows based on the nature of individual flows: circuits, and packets. We design constant-factor polynomial-time approximation algorithms for scheduling packet-based coflows with or without given flow paths, and circuit-based coflows with given flow paths. Furthermore, we give an

O(\log n/\log \log n)

-approximation polynomial time algorithm for scheduling circuit-based coflows where flow paths are not given (here

n

is the number of network edges). We obtain our results by developing a general framework for coflow schedules, based on interval-indexed linear programs, which may extend to other coflow models and objective functions and may also yield improved approximation bounds for specific network scenarios. We also present an experimental evaluation of our approach for circuit-based coflows that show a performance improvement of at least 22% on average over competing heuristics.Comment: Fixed minor typo

arXiv.org e-Print Archive

Crossref

Integrality Gap of Time-Indexed Linear Programming Relaxation for Coflow Scheduling

Author: Fukunaga Takuro
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)
Publication date: 01/01/2022
Field of study

Coflow is a set of related parallel data flows in a network. The goal of the coflow scheduling is to process all the demands of the given coflows while minimizing the weighted completion time. It is known that the coflow scheduling problem admits several polynomial-time 5-approximation algorithms that compute solutions by rounding linear programming (LP) relaxations of the problem. In this paper, we investigate the time-indexed LP relaxation for coflow scheduling. We show that the integrality gap of the time-indexed LP relaxation is at most 4. We also show that yet another polynomial-time 5-approximation algorithm can be obtained by rounding the solutions to the time-indexed LP relaxation

Dagstuhl Research Online Publication Server

Matroid Coflow Scheduling

Author: Im Sungjin
Moseley Benjamin
Pruhs Kirk
Purohit Manish
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)
Publication date: 01/01/2019
Field of study

We consider the matroid coflow scheduling problem, where each job is comprised of a set of flows and the family of sets that can be scheduled at any time form a matroid. Our main result is a polynomial-time algorithm that yields a 2-approximation for the objective of minimizing the weighted completion time. This result is tight assuming P != NP. As a by-product we also obtain the first (2+epsilon)-approximation algorithm for the preemptive concurrent open shop scheduling problem

Dagstuhl Research Online Publication Server

Recommended from our members

Resource Allocation In Large-Scale Distributed Systems

Author: Shafiee Mehrnoosh
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2021
Field of study

The focus of this dissertation is design and analysis of scheduling algorithms for distributed computer systems, i.e., data centers. Today’s data centers can contain thousands of servers and typically use a multi-tier switch network to provide connectivity among the servers. Data centers are the host for execution of various data-parallel applications. As an abstraction, a job in a data center can be thought of as a group of interdependent tasks, each with various requirements which need to be scheduled for execution on the servers and the data flows between the tasks that need to be scheduled in the switch network. In this thesis, we study both flow and task scheduling problems under the features of modern parallel computing frameworks.For the flow scheduling problem, we study three models. The first model considers a general network topology where flows among the various source-destination pairs of servers are generated dynamically over time. The goal is to assign the end-to-end data flows among the available paths in order to efficiently balance the load in the network. We propose a myopic algorithm that is computationally efficient and prove that it asymptotically minimizes the total network cost using a convex optimization model, fluid limit and Lyapunov analysis. We further propose randomized versions of our myopic algorithm. The second model consider the case that there is dependence among flows. Specifically, a coflow is defined as a collection of parallel flows whose completion time is determined by the completion time of the last flow in the collection. Our main result is a 5-approximation deterministic algorithm that schedule coflows in polynomial time so as to minimize the total weighted completion times. The key ingredient of our approach is an improved linear program formulation for sorting the coflows followed by a simple list scheduling policy. Lastly, we study scheduling coflows of multi-stage jobs to minimize the jobs’ total weighted completion times. Each job is represented by a DAG (Directed Acyclic Graph) among its coflows that captures the dependencies among the coflows. We define g(m) = log(m)/log(log(m)) and h(m, μ) = log(mμ)/(log(log(mμ)), where m is number of servers, μ is the maximum number of coflows in a job. We develop two algorithms with approximation ratios O(√μg(m)) and O(√μg(m)h(m, μ)) for jobs with general DAGs and rooted trees, respectively. The algorithms rely on random delaying and merging optimal schedules of the coflows in the jobs’ DAG, followed by enforcing dependency among coflows and the links’ capacity constraints. For the task scheduling problem, we study two models. We consider a setting where each job consists of a set of parallel tasks that need to be processed on different servers, and the job is completed once all its tasks finish processing. In the first model, each job is associated with a utility which is a decreasing function of its completion time. The objective is to schedule tasks in a way that achieves max-min fairness for jobs’ utilities. We first show a strong result regarding NP-hardness of this problem. We then proceed to define two notions of approximation solutions and develop scheduling algorithms that provide guarantees under these approximation notions, using dynamic programming and random perturbation of tasks’ processing times. In the second model, we further assume that processing times of tasks can be server dependent and a server can process (pack) multiple tasks at the same time subject to its capacity. We then propose three algorithms with approximation ratios of 4, (6 + ε), and 24 for different cases where preemption and migration of tasks among the servers are or are not allowed. Our algorithms use a combination of linear program relaxation and greedy packing techniques. To demonstrate the gains in practice, we evaluate all the proposed algorithms and compare their performances with the prior approaches through extensive simulations using real and synthesized traffic traces. We hope this work inspires improvements to existing job management and scheduling in distributed computer systems

Columbia University Academic Commons

Scalable Verification of GNN-based Job Schedulers

Author: Barrett Clark
Narodytska Nina
Sharif Mahmood
Singh Gagandeep
Wu Haoze
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/09/2022
Field of study

Recently, Graph Neural Networks (GNNs) have been applied for scheduling jobs over clusters, achieving better performance than hand-crafted heuristics. Despite their impressive performance, concerns remain over whether these GNN-based job schedulers meet users' expectations about other important properties, such as strategy-proofness, sharing incentive, and stability. In this work, we consider formal verification of GNN-based job schedulers. We address several domain-specific challenges such as networks that are deeper and specifications that are richer than those encountered when verifying image and NLP classifiers. We develop vegas, the first general framework for verifying both single-step and multi-step properties of these schedulers based on carefully designed algorithms that combine abstractions, refinements, solvers, and proof transfer. Our experimental results show that vegas achieves significant speed-up when verifying important properties of a state-of-the-art GNN-based scheduler compared to previous methods.Comment: Condensed version published at OOPSLA'2

arXiv.org e-Print Archive