2,516 research outputs found
Design and analysis of a class-aware recursive loop scheduler for class-based scheduling
In this paper, we consider the problem of devising a loop scheduler that allocates slots to users according to their relative weights as smoothly as possible. Instead of the existing notion of smoothness based on balancedness, we propose a variance-based metric which is more intuitive and easier to compute. We propose a recursive loop scheduler for a class-based scheduling scenario based on an optimal weighted round-robin scheduler. We show that it achieves very good allocation smoothness with almost no degradation in intra-class fairness. In addition, we also demonstrate the equivalence between our proposed metric and the balancedness-based metric
Design and analysis of a class-aware recursive loop scheduler for class-based scheduling
In this paper, we consider the problem of devising a loop scheduler that allocates slots to users according to their relative weights as smoothly as possible. Instead of the existing notion of smoothness based on balancedness, we propose a variance-based metric which is more intuitive and easier to compute. We propose a recursive loop scheduler for a class-based scheduling scenario based on an optimal weighted round-robin scheduler. We show that it achieves very good allocation smoothness with almost no degradation in intra-class fairness. In addition, we also demonstrate the equivalence between our proposed metric and the balancedness-based metric
Polly's Polyhedral Scheduling in the Presence of Reductions
The polyhedral model provides a powerful mathematical abstraction to enable
effective optimization of loop nests with respect to a given optimization goal,
e.g., exploiting parallelism. Unexploited reduction properties are a frequent
reason for polyhedral optimizers to assume parallelism prohibiting dependences.
To our knowledge, no polyhedral loop optimizer available in any production
compiler provides support for reductions. In this paper, we show that
leveraging the parallelism of reductions can lead to a significant performance
increase. We give a precise, dependence based, definition of reductions and
discuss ways to extend polyhedral optimization to exploit the associativity and
commutativity of reduction computations. We have implemented a
reduction-enabled scheduling approach in the Polly polyhedral optimizer and
evaluate it on the standard Polybench 3.2 benchmark suite. We were able to
detect and model all 52 arithmetic reductions and achieve speedups up to
2.21 on a quad core machine by exploiting the multidimensional
reduction in the BiCG benchmark.Comment: Presented at the IMPACT15 worksho
Design of State-based Schedulers for a Network of Control Loops
For a closed-loop system, which has a contention-based multiple access
network on its sensor link, the Medium Access Controller (MAC) may discard some
packets when the traffic on the link is high. We use a local state-based
scheduler to select a few critical data packets to send to the MAC. In this
paper, we analyze the impact of such a scheduler on the closed-loop system in
the presence of traffic, and show that there is a dual effect with state-based
scheduling. In general, this makes the optimal scheduler and controller hard to
find. However, by removing past controls from the scheduling criterion, we find
that certainty equivalence holds. This condition is related to the classical
result of Bar-Shalom and Tse, and it leads to the design of a scheduler with a
certainty equivalent controller. This design, however, does not result in an
equivalent system to the original problem, in the sense of Witsenhausen.
Computing the estimate is difficult, but can be simplified by introducing a
symmetry constraint on the scheduler. Based on these findings, we propose a
dual predictor architecture for the closed-loop system, which ensures
separation between scheduler, observer and controller. We present an example of
this architecture, which illustrates a network-aware event-triggering
mechanism.Comment: 17 pages, technical repor
Locality-Aware Dynamic Task Graph Scheduling
Dynamic task graph schedulers automatically balance work across processor cores by scheduling tasks among available threads while preserving dependences. In this paper, we design NabbitC, a provably efficient dynamic task graph scheduler that accounts for data locality on NUMA systems. NabbitC allows users to assign a color to each task representing the location (e.g., a processor core) that has the most efficient access to data needed during that node’s execution. NabbitC then automatically adjusts the scheduling so as to preferentially execute each node at the location that matches its color—leading to better locality because the node is likely to make local rather than remote accesses. At the same time, NabbitC tries to optimize load balance and not add too much overhead compared to the vanilla Nabbit scheduler that does not consider locality. We provide a theoretical analysis that shows that NabbitC does not asymptotically impact the scalability of Nabbit . We evaluated the performance of NabbitC on a suite of memory intensive benchmarks. Our experiments indicates that adding locality awareness has a considerable performance advantage compared to the vanilla Nabbit scheduler. In addition, we also compared NabbitC to OpenMP programs for both regular and irregular applications. For regular applications, OpenMP achieves perfect locality and perfect load balance statically. For these benchmarks, NabbitC has a small performance penalty compared to OpenMP due to its dynamic scheduling strategy. For irregular applications, where OpenMP can not achieve locality and load balance simultaneously, we find that NabbitC performs better. Therefore, NabbitC combines the benefits of locality- aware scheduling for regular applications (the forte of static schedulers such as those in OpenMP) and dynamically adapting to load imbalance (the forte of dynamic schedulers such as Cilk Plus, TBB, and Nabbit)
Extending the Nested Parallel Model to the Nested Dataflow Model with Provably Efficient Schedulers
The nested parallel (a.k.a. fork-join) model is widely used for writing
parallel programs. However, the two composition constructs, i.e. ""
(parallel) and "" (serial), are insufficient in expressing "partial
dependencies" or "partial parallelism" in a program. We propose a new dataflow
composition construct "" to express partial dependencies in
algorithms in a processor- and cache-oblivious way, thus extending the Nested
Parallel (NP) model to the \emph{Nested Dataflow} (ND) model. We redesign
several divide-and-conquer algorithms ranging from dense linear algebra to
dynamic-programming in the ND model and prove that they all have optimal span
while retaining optimal cache complexity. We propose the design of runtime
schedulers that map ND programs to multicore processors with multiple levels of
possibly shared caches (i.e, Parallel Memory Hierarchies) and provide
theoretical guarantees on their ability to preserve locality and load balance.
For this, we adapt space-bounded (SB) schedulers for the ND model. We show that
our algorithms have increased "parallelizability" in the ND model, and that SB
schedulers can use the extra parallelizability to achieve asymptotically
optimal bounds on cache misses and running time on a greater number of
processors than in the NP model. The running time for the algorithms in this
paper is , where is the cache complexity of task ,
is the cost of cache miss at level- cache which is of size ,
is a constant, and is the number of processors in an
-level cache hierarchy
Life of occam-Pi
This paper considers some questions prompted by a brief review of the history of computing. Why is programming so hard? Why is concurrency considered an “advanced” subject? What’s the matter with Objects? Where did all the Maths go? In searching for answers, the paper looks at some concerns over fundamental ideas within object orientation (as represented by modern programming languages), before focussing on the concurrency model of communicating processes and its particular expression in the occam family of languages. In that focus, it looks at the history of occam, its underlying philosophy (Ockham’s Razor), its semantic foundation on Hoare’s CSP, its principles of process oriented design and its development over almost three decades into occam-? (which blends in the concurrency dynamics of Milner’s ?-calculus). Also presented will be an urgent need for rationalisation – occam-? is an experiment that has demonstrated significant results, but now needs time to be spent on careful review and implementing the conclusions of that review. Finally, the future is considered. In particular, is there a future
- …