2,516 research outputs found

    Design and analysis of a class-aware recursive loop scheduler for class-based scheduling

    Get PDF
    In this paper, we consider the problem of devising a loop scheduler that allocates slots to users according to their relative weights as smoothly as possible. Instead of the existing notion of smoothness based on balancedness, we propose a variance-based metric which is more intuitive and easier to compute. We propose a recursive loop scheduler for a class-based scheduling scenario based on an optimal weighted round-robin scheduler. We show that it achieves very good allocation smoothness with almost no degradation in intra-class fairness. In addition, we also demonstrate the equivalence between our proposed metric and the balancedness-based metric

    Design and analysis of a class-aware recursive loop scheduler for class-based scheduling

    Get PDF
    In this paper, we consider the problem of devising a loop scheduler that allocates slots to users according to their relative weights as smoothly as possible. Instead of the existing notion of smoothness based on balancedness, we propose a variance-based metric which is more intuitive and easier to compute. We propose a recursive loop scheduler for a class-based scheduling scenario based on an optimal weighted round-robin scheduler. We show that it achieves very good allocation smoothness with almost no degradation in intra-class fairness. In addition, we also demonstrate the equivalence between our proposed metric and the balancedness-based metric

    Polly's Polyhedral Scheduling in the Presence of Reductions

    Full text link
    The polyhedral model provides a powerful mathematical abstraction to enable effective optimization of loop nests with respect to a given optimization goal, e.g., exploiting parallelism. Unexploited reduction properties are a frequent reason for polyhedral optimizers to assume parallelism prohibiting dependences. To our knowledge, no polyhedral loop optimizer available in any production compiler provides support for reductions. In this paper, we show that leveraging the parallelism of reductions can lead to a significant performance increase. We give a precise, dependence based, definition of reductions and discuss ways to extend polyhedral optimization to exploit the associativity and commutativity of reduction computations. We have implemented a reduction-enabled scheduling approach in the Polly polyhedral optimizer and evaluate it on the standard Polybench 3.2 benchmark suite. We were able to detect and model all 52 arithmetic reductions and achieve speedups up to 2.21×\times on a quad core machine by exploiting the multidimensional reduction in the BiCG benchmark.Comment: Presented at the IMPACT15 worksho

    Design of State-based Schedulers for a Network of Control Loops

    Full text link
    For a closed-loop system, which has a contention-based multiple access network on its sensor link, the Medium Access Controller (MAC) may discard some packets when the traffic on the link is high. We use a local state-based scheduler to select a few critical data packets to send to the MAC. In this paper, we analyze the impact of such a scheduler on the closed-loop system in the presence of traffic, and show that there is a dual effect with state-based scheduling. In general, this makes the optimal scheduler and controller hard to find. However, by removing past controls from the scheduling criterion, we find that certainty equivalence holds. This condition is related to the classical result of Bar-Shalom and Tse, and it leads to the design of a scheduler with a certainty equivalent controller. This design, however, does not result in an equivalent system to the original problem, in the sense of Witsenhausen. Computing the estimate is difficult, but can be simplified by introducing a symmetry constraint on the scheduler. Based on these findings, we propose a dual predictor architecture for the closed-loop system, which ensures separation between scheduler, observer and controller. We present an example of this architecture, which illustrates a network-aware event-triggering mechanism.Comment: 17 pages, technical repor

    Locality-Aware Dynamic Task Graph Scheduling

    Get PDF
    Dynamic task graph schedulers automatically balance work across processor cores by scheduling tasks among available threads while preserving dependences. In this paper, we design NabbitC, a provably efficient dynamic task graph scheduler that accounts for data locality on NUMA systems. NabbitC allows users to assign a color to each task representing the location (e.g., a processor core) that has the most efficient access to data needed during that node’s execution. NabbitC then automatically adjusts the scheduling so as to preferentially execute each node at the location that matches its color—leading to better locality because the node is likely to make local rather than remote accesses. At the same time, NabbitC tries to optimize load balance and not add too much overhead compared to the vanilla Nabbit scheduler that does not consider locality. We provide a theoretical analysis that shows that NabbitC does not asymptotically impact the scalability of Nabbit . We evaluated the performance of NabbitC on a suite of memory intensive benchmarks. Our experiments indicates that adding locality awareness has a considerable performance advantage compared to the vanilla Nabbit scheduler. In addition, we also compared NabbitC to OpenMP programs for both regular and irregular applications. For regular applications, OpenMP achieves perfect locality and perfect load balance statically. For these benchmarks, NabbitC has a small performance penalty compared to OpenMP due to its dynamic scheduling strategy. For irregular applications, where OpenMP can not achieve locality and load balance simultaneously, we find that NabbitC performs better. Therefore, NabbitC combines the benefits of locality- aware scheduling for regular applications (the forte of static schedulers such as those in OpenMP) and dynamically adapting to load imbalance (the forte of dynamic schedulers such as Cilk Plus, TBB, and Nabbit)

    Extending the Nested Parallel Model to the Nested Dataflow Model with Provably Efficient Schedulers

    Full text link
    The nested parallel (a.k.a. fork-join) model is widely used for writing parallel programs. However, the two composition constructs, i.e. "\parallel" (parallel) and ";;" (serial), are insufficient in expressing "partial dependencies" or "partial parallelism" in a program. We propose a new dataflow composition construct "\leadsto" to express partial dependencies in algorithms in a processor- and cache-oblivious way, thus extending the Nested Parallel (NP) model to the \emph{Nested Dataflow} (ND) model. We redesign several divide-and-conquer algorithms ranging from dense linear algebra to dynamic-programming in the ND model and prove that they all have optimal span while retaining optimal cache complexity. We propose the design of runtime schedulers that map ND programs to multicore processors with multiple levels of possibly shared caches (i.e, Parallel Memory Hierarchies) and provide theoretical guarantees on their ability to preserve locality and load balance. For this, we adapt space-bounded (SB) schedulers for the ND model. We show that our algorithms have increased "parallelizability" in the ND model, and that SB schedulers can use the extra parallelizability to achieve asymptotically optimal bounds on cache misses and running time on a greater number of processors than in the NP model. The running time for the algorithms in this paper is O(i=0h1Q(t;σMi)Cip)O\left(\frac{\sum_{i=0}^{h-1} Q^{*}({\mathsf t};\sigma\cdot M_i)\cdot C_i}{p}\right), where QQ^{*} is the cache complexity of task t{\mathsf t}, CiC_i is the cost of cache miss at level-ii cache which is of size MiM_i, σ(0,1)\sigma\in(0,1) is a constant, and pp is the number of processors in an hh-level cache hierarchy

    Life of occam-Pi

    Get PDF
    This paper considers some questions prompted by a brief review of the history of computing. Why is programming so hard? Why is concurrency considered an “advanced” subject? What’s the matter with Objects? Where did all the Maths go? In searching for answers, the paper looks at some concerns over fundamental ideas within object orientation (as represented by modern programming languages), before focussing on the concurrency model of communicating processes and its particular expression in the occam family of languages. In that focus, it looks at the history of occam, its underlying philosophy (Ockham’s Razor), its semantic foundation on Hoare’s CSP, its principles of process oriented design and its development over almost three decades into occam-? (which blends in the concurrency dynamics of Milner’s ?-calculus). Also presented will be an urgent need for rationalisation – occam-? is an experiment that has demonstrated significant results, but now needs time to be spent on careful review and implementing the conclusions of that review. Finally, the future is considered. In particular, is there a future
    corecore