Search CORE

2,969 research outputs found

Extending the Nested Parallel Model to the Nested Dataflow Model with Provably Efficient Schedulers

Author: Dinh David
Simhadri Harsha Vardhan
Tang Yuan
Publication venue
Publication date: 14/02/2016
Field of study

The nested parallel (a.k.a. fork-join) model is widely used for writing parallel programs. However, the two composition constructs, i.e. "

\parallel

" (parallel) and "

;

" (serial), are insufficient in expressing "partial dependencies" or "partial parallelism" in a program. We propose a new dataflow composition construct "

\leadsto

" to express partial dependencies in algorithms in a processor- and cache-oblivious way, thus extending the Nested Parallel (NP) model to the \emph{Nested Dataflow} (ND) model. We redesign several divide-and-conquer algorithms ranging from dense linear algebra to dynamic-programming in the ND model and prove that they all have optimal span while retaining optimal cache complexity. We propose the design of runtime schedulers that map ND programs to multicore processors with multiple levels of possibly shared caches (i.e, Parallel Memory Hierarchies) and provide theoretical guarantees on their ability to preserve locality and load balance. For this, we adapt space-bounded (SB) schedulers for the ND model. We show that our algorithms have increased "parallelizability" in the ND model, and that SB schedulers can use the extra parallelizability to achieve asymptotically optimal bounds on cache misses and running time on a greater number of processors than in the NP model. The running time for the algorithms in this paper is

O\left(\frac{\sum_{i=0}^{h-1} Q^{*}({\mathsf t};\sigma\cdot M_i)\cdot C_i}{p}\right)

, where

Q^{*}

is the cache complexity of task

{\mathsf t}

C_i

is the cost of cache miss at level-

i

cache which is of size

M_i

\sigma\in(0,1)

is a constant, and

p

is the number of processors in an

h

-level cache hierarchy

arXiv.org e-Print Archive

Crossref

A New Lower Bound for Deterministic Truthful Scheduling

Author: A Filos-Ratsikas
A Mu’alem
C Ventre
C Yu
E Koutsoupias
E Koutsoupias
G Christodoulou
G Christodoulou
G Christodoulou
I Ashlagi
JK Lenstra
N Nisan
O Kuryatnikova
P Dhangwatnotai
P Lu
P Lu
P Penna
R Lavi
V Auletta
VV Vazirani
X Chen
Y Giannakopoulos
Y Giannakopoulos
Publication venue
Publication date: 07/07/2020
Field of study

We study the problem of truthfully scheduling

m

tasks to

n

selfish unrelated machines, under the objective of makespan minimization, as was introduced in the seminal work of Nisan and Ronen [STOC'99]. Closing the current gap of

[2.618,n]

on the approximation ratio of deterministic truthful mechanisms is a notorious open problem in the field of algorithmic mechanism design. We provide the first such improvement in more than a decade, since the lower bounds of

2.414

(for

n=3

) and

2.618

(for

n\to\infty

) by Christodoulou et al. [SODA'07] and Koutsoupias and Vidali [MFCS'07], respectively. More specifically, we show that the currently best lower bound of

2.618

can be achieved even for just

n=4

machines; for

n=5

we already get the first improvement, namely

2.711

; and allowing the number of machines to grow arbitrarily large we can get a lower bound of

2.755

.Comment: 15 page

arXiv.org e-Print Archive

Crossref

Enlighten

Throughput Optimal On-Line Algorithms for Advanced Resource Reservation in Ultra High-Speed Networks

Author: Cohen Reuven
Fazlollahi Niloofar
Starobinski David
Publication venue
Publication date: 02/11/2007
Field of study

Advanced channel reservation is emerging as an important feature of ultra high-speed networks requiring the transfer of large files. Applications include scientific data transfers and database backup. In this paper, we present two new, on-line algorithms for advanced reservation, called BatchAll and BatchLim, that are guaranteed to achieve optimal throughput performance, based on multi-commodity flow arguments. Both algorithms are shown to have polynomial-time complexity and provable bounds on the maximum delay for 1+epsilon bandwidth augmented networks. The BatchLim algorithm returns the completion time of a connection immediately as a request is placed, but at the expense of a slightly looser competitive ratio than that of BatchAll. We also present a simple approach that limits the number of parallel paths used by the algorithms while provably bounding the maximum reduction factor in the transmission throughput. We show that, although the number of different paths can be exponentially large, the actual number of paths needed to approximate the flow is quite small and proportional to the number of edges in the network. Simulations for a number of topologies show that, in practice, 3 to 5 parallel paths are sufficient to achieve close to optimal performance. The performance of the competitive algorithms are also compared to a greedy benchmark, both through analysis and simulation.Comment: 9 pages, 8 figure

arXiv.org e-Print Archive

CiteSeerX