Search CORE

2,516 research outputs found

Design and analysis of a class-aware recursive loop scheduler for class-based scheduling

Author: ROM Raphael
SIDI Moshe
TAN Hwee-Pink
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

In this paper, we consider the problem of devising a loop scheduler that allocates slots to users according to their relative weights as smoothly as possible. Instead of the existing notion of smoothness based on balancedness, we propose a variance-based metric which is more intuitive and easier to compute. We propose a recursive loop scheduler for a class-based scheduling scenario based on an optimal weighted round-robin scheduler. We show that it achieves very good allocation smoothness with almost no degradation in intra-class fairness. In addition, we also demonstrate the equivalence between our proposed metric and the balancedness-based metric

Repository TU/e

Pure OAI Repository

Institutional Knowledge at Singapore Management University

Design and analysis of a class-aware recursive loop scheduler for class-based scheduling

Author: ROM Raphael
SIDI Moshe
TAN Hwee-Pink
Publication venue: Elsevier
Publication date: 01/01/2005
Field of study

Crossref

Institutional Knowledge at Singapore Management University

Polly's Polyhedral Scheduling in the Presence of Reductions

Author: Benaissa Zino
Doerfert Johannes
Hack Sebastian
Streit Kevin
Publication venue
Publication date: 01/01/2015
Field of study

The polyhedral model provides a powerful mathematical abstraction to enable effective optimization of loop nests with respect to a given optimization goal, e.g., exploiting parallelism. Unexploited reduction properties are a frequent reason for polyhedral optimizers to assume parallelism prohibiting dependences. To our knowledge, no polyhedral loop optimizer available in any production compiler provides support for reductions. In this paper, we show that leveraging the parallelism of reductions can lead to a significant performance increase. We give a precise, dependence based, definition of reductions and discuss ways to extend polyhedral optimization to exploit the associativity and commutativity of reduction computations. We have implemented a reduction-enabled scheduling approach in the Polly polyhedral optimizer and evaluate it on the standard Polybench 3.2 benchmark suite. We were able to detect and model all 52 arithmetic reductions and achieve speedups up to 2.21

\times

on a quad core machine by exploiting the multidimensional reduction in the BiCG benchmark.Comment: Presented at the IMPACT15 worksho

arXiv.org e-Print Archive

CISPA – Helmholtz-Zentrum für Informationssicherheit

Design of State-based Schedulers for a Network of Control Loops

Author: Johansson Karl H.
Ramesh Chithrupa
Sandberg Henrik
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/02/2012
Field of study

For a closed-loop system, which has a contention-based multiple access network on its sensor link, the Medium Access Controller (MAC) may discard some packets when the traffic on the link is high. We use a local state-based scheduler to select a few critical data packets to send to the MAC. In this paper, we analyze the impact of such a scheduler on the closed-loop system in the presence of traffic, and show that there is a dual effect with state-based scheduling. In general, this makes the optimal scheduler and controller hard to find. However, by removing past controls from the scheduling criterion, we find that certainty equivalence holds. This condition is related to the classical result of Bar-Shalom and Tse, and it leads to the design of a scheduler with a certainty equivalent controller. This design, however, does not result in an equivalent system to the original problem, in the sense of Witsenhausen. Computing the estimate is difficult, but can be simplified by introducing a symmetry constraint on the scheduler. Based on these findings, we propose a dual predictor architecture for the closed-loop system, which ensures separation between scheduler, observer and controller. We present an example of this architecture, which illustrates a network-aware event-triggering mechanism.Comment: 17 pages, technical repor

arXiv.org e-Print Archive

Crossref

Locality-Aware Dynamic Task Graph Scheduling

Author: Agrawal Kunal
Krishnamoorthy Sriram
Maglalang Jordyn
Publication venue: Washington University Open Scholarship
Publication date: 01/01/2016
Field of study

Dynamic task graph schedulers automatically balance work across processor cores by scheduling tasks among available threads while preserving dependences. In this paper, we design NabbitC, a provably efficient dynamic task graph scheduler that accounts for data locality on NUMA systems. NabbitC allows users to assign a color to each task representing the location (e.g., a processor core) that has the most efficient access to data needed during that node’s execution. NabbitC then automatically adjusts the scheduling so as to preferentially execute each node at the location that matches its color—leading to better locality because the node is likely to make local rather than remote accesses. At the same time, NabbitC tries to optimize load balance and not add too much overhead compared to the vanilla Nabbit scheduler that does not consider locality. We provide a theoretical analysis that shows that NabbitC does not asymptotically impact the scalability of Nabbit . We evaluated the performance of NabbitC on a suite of memory intensive benchmarks. Our experiments indicates that adding locality awareness has a considerable performance advantage compared to the vanilla Nabbit scheduler. In addition, we also compared NabbitC to OpenMP programs for both regular and irregular applications. For regular applications, OpenMP achieves perfect locality and perfect load balance statically. For these benchmarks, NabbitC has a small performance penalty compared to OpenMP due to its dynamic scheduling strategy. For irregular applications, where OpenMP can not achieve locality and load balance simultaneously, we find that NabbitC performs better. Therefore, NabbitC combines the benefits of locality- aware scheduling for regular applications (the forte of static schedulers such as those in OpenMP) and dynamically adapting to load imbalance (the forte of dynamic schedulers such as Cilk Plus, TBB, and Nabbit)

Crossref

Washington University St. Louis: Open Scholarship

Extending the Nested Parallel Model to the Nested Dataflow Model with Provably Efficient Schedulers

Author: Dinh David
Simhadri Harsha Vardhan
Tang Yuan
Publication venue
Publication date: 14/02/2016
Field of study

The nested parallel (a.k.a. fork-join) model is widely used for writing parallel programs. However, the two composition constructs, i.e. "

\parallel

" (parallel) and "

;

" (serial), are insufficient in expressing "partial dependencies" or "partial parallelism" in a program. We propose a new dataflow composition construct "

\leadsto

" to express partial dependencies in algorithms in a processor- and cache-oblivious way, thus extending the Nested Parallel (NP) model to the \emph{Nested Dataflow} (ND) model. We redesign several divide-and-conquer algorithms ranging from dense linear algebra to dynamic-programming in the ND model and prove that they all have optimal span while retaining optimal cache complexity. We propose the design of runtime schedulers that map ND programs to multicore processors with multiple levels of possibly shared caches (i.e, Parallel Memory Hierarchies) and provide theoretical guarantees on their ability to preserve locality and load balance. For this, we adapt space-bounded (SB) schedulers for the ND model. We show that our algorithms have increased "parallelizability" in the ND model, and that SB schedulers can use the extra parallelizability to achieve asymptotically optimal bounds on cache misses and running time on a greater number of processors than in the NP model. The running time for the algorithms in this paper is

O\left(\frac{\sum_{i=0}^{h-1} Q^{*}({\mathsf t};\sigma\cdot M_i)\cdot C_i}{p}\right)

, where

Q^{*}

is the cache complexity of task

{\mathsf t}

C_i

is the cost of cache miss at level-

i

cache which is of size

M_i

\sigma\in(0,1)

is a constant, and

p

is the number of processors in an

h

-level cache hierarchy

arXiv.org e-Print Archive

Crossref

Life of occam-Pi

Author: Welch Peter H.
Publication venue: Open Channel Publishing
Publication date: 01/01/2013
Field of study

This paper considers some questions prompted by a brief review of the history of computing. Why is programming so hard? Why is concurrency considered an “advanced” subject? What’s the matter with Objects? Where did all the Maths go? In searching for answers, the paper looks at some concerns over fundamental ideas within object orientation (as represented by modern programming languages), before focussing on the concurrency model of communicating processes and its particular expression in the occam family of languages. In that focus, it looks at the history of occam, its underlying philosophy (Ockham’s Razor), its semantic foundation on Hoare’s CSP, its principles of process oriented design and its development over almost three decades into occam-? (which blends in the concurrency dynamics of Milner’s ?-calculus). Also presented will be an urgent need for rationalisation – occam-? is an experiment that has demonstrated significant results, but now needs time to be spent on careful review and implementing the conclusions of that review. Finally, the future is considered. In particular, is there a future

Kent Academic Repository