2,969 research outputs found
Extending the Nested Parallel Model to the Nested Dataflow Model with Provably Efficient Schedulers
The nested parallel (a.k.a. fork-join) model is widely used for writing
parallel programs. However, the two composition constructs, i.e. ""
(parallel) and "" (serial), are insufficient in expressing "partial
dependencies" or "partial parallelism" in a program. We propose a new dataflow
composition construct "" to express partial dependencies in
algorithms in a processor- and cache-oblivious way, thus extending the Nested
Parallel (NP) model to the \emph{Nested Dataflow} (ND) model. We redesign
several divide-and-conquer algorithms ranging from dense linear algebra to
dynamic-programming in the ND model and prove that they all have optimal span
while retaining optimal cache complexity. We propose the design of runtime
schedulers that map ND programs to multicore processors with multiple levels of
possibly shared caches (i.e, Parallel Memory Hierarchies) and provide
theoretical guarantees on their ability to preserve locality and load balance.
For this, we adapt space-bounded (SB) schedulers for the ND model. We show that
our algorithms have increased "parallelizability" in the ND model, and that SB
schedulers can use the extra parallelizability to achieve asymptotically
optimal bounds on cache misses and running time on a greater number of
processors than in the NP model. The running time for the algorithms in this
paper is , where is the cache complexity of task ,
is the cost of cache miss at level- cache which is of size ,
is a constant, and is the number of processors in an
-level cache hierarchy
A New Lower Bound for Deterministic Truthful Scheduling
We study the problem of truthfully scheduling tasks to selfish
unrelated machines, under the objective of makespan minimization, as was
introduced in the seminal work of Nisan and Ronen [STOC'99]. Closing the
current gap of on the approximation ratio of deterministic truthful
mechanisms is a notorious open problem in the field of algorithmic mechanism
design. We provide the first such improvement in more than a decade, since the
lower bounds of (for ) and (for ) by
Christodoulou et al. [SODA'07] and Koutsoupias and Vidali [MFCS'07],
respectively. More specifically, we show that the currently best lower bound of
can be achieved even for just machines; for we already get
the first improvement, namely ; and allowing the number of machines to
grow arbitrarily large we can get a lower bound of .Comment: 15 page
Throughput Optimal On-Line Algorithms for Advanced Resource Reservation in Ultra High-Speed Networks
Advanced channel reservation is emerging as an important feature of ultra
high-speed networks requiring the transfer of large files. Applications include
scientific data transfers and database backup. In this paper, we present two
new, on-line algorithms for advanced reservation, called BatchAll and BatchLim,
that are guaranteed to achieve optimal throughput performance, based on
multi-commodity flow arguments. Both algorithms are shown to have
polynomial-time complexity and provable bounds on the maximum delay for
1+epsilon bandwidth augmented networks. The BatchLim algorithm returns the
completion time of a connection immediately as a request is placed, but at the
expense of a slightly looser competitive ratio than that of BatchAll. We also
present a simple approach that limits the number of parallel paths used by the
algorithms while provably bounding the maximum reduction factor in the
transmission throughput. We show that, although the number of different paths
can be exponentially large, the actual number of paths needed to approximate
the flow is quite small and proportional to the number of edges in the network.
Simulations for a number of topologies show that, in practice, 3 to 5 parallel
paths are sufficient to achieve close to optimal performance. The performance
of the competitive algorithms are also compared to a greedy benchmark, both
through analysis and simulation.Comment: 9 pages, 8 figure
- β¦