3,120 research outputs found

    Work Stealing Simulator

    Get PDF
    We present in this paper a Work Stealing lightweight PYTHON simulator. Our simulator is used to execute an application (list of tasks with or without dependencies), on a multiple processors platform linked by specific topology. We first give an overview of the different variants of the work stealing algorithm, then we present the architecture of our light Work Stealing simulator. Its architecture facilitates the development of other types of applications and other topologies for interconnecting the processors. We present the use cases of the simulator and the different types of results

    Remote-scope Promotion: Clarified, Rectified, and Verified

    Get PDF
    Modern accelerator programming frameworks, such as OpenCL, organise threads into work-groups. Remote-scope promotion (RSP) is a language extension recently proposed by AMD researchers that is designed to enable applications, for the first time, both to optimise for the common case of intra-work-group communication (using memory scopes to provide consistency only within a work-group) and to allow occasional inter-work-group communication (as required, for instance, to support the popular load-balancing idiom of work stealing). We present the first formal, axiomatic memory model of OpenCL extended with RSP. We have extended the Herd memory model simulator with support for OpenCL kernels that exploit RSP, and used it to discover bugs in several litmus tests and a work-stealing queue, that have been used previously in the study of RSP. We have also formalised the proposed GPU implementation of RSP. The formalisation process allowed us to identify bugs in the description of RSP that could result in well-synchronised programs experiencing memory inconsistencies. We present and prove sound a new implementation of RSP that incorporates bug fixes and requires less non-standard hardware than the original implementation. This work, a collaboration between academia and industry, clearly demonstrates how, when designing hardware support for a new concurrent language feature, the early application of formal tools and techniques can help to prevent errors, such as those we have found, from making it into silicon

    Data Structures for Task-based Priority Scheduling

    Full text link
    Many task-parallel applications can benefit from attempting to execute tasks in a specific order, as for instance indicated by priorities associated with the tasks. We present three lock-free data structures for priority scheduling with different trade-offs on scalability and ordering guarantees. First we propose a basic extension to work-stealing that provides good scalability, but cannot provide any guarantees for task-ordering in-between threads. Next, we present a centralized priority data structure based on kk-fifo queues, which provides strong (but still relaxed with regard to a sequential specification) guarantees. The parameter kk allows to dynamically configure the trade-off between scalability and the required ordering guarantee. Third, and finally, we combine both data structures into a hybrid, kk-priority data structure, which provides scalability similar to the work-stealing based approach for larger kk, while giving strong ordering guarantees for smaller kk. We argue for using the hybrid data structure as the best compromise for generic, priority-based task-scheduling. We analyze the behavior and trade-offs of our data structures in the context of a simple parallelization of Dijkstra's single-source shortest path algorithm. Our theoretical analysis and simulations show that both the centralized and the hybrid kk-priority based data structures can give strong guarantees on the useful work performed by the parallel Dijkstra algorithm. We support our results with experimental evidence on an 80-core Intel Xeon system

    Mesmerizer: A Effective Tool for a Complete Peer-to-Peer Software Development Life-cycle

    Get PDF
    In this paper we present what are, in our experience, the best practices in Peer-To-Peer(P2P) application development and how we combined them in a middleware platform called Mesmerizer. We explain how simulation is an integral part of the development process and not just an assessment tool. We then present our component-based event-driven framework for P2P application development, which can be used to execute multiple instances of the same application in a strictly controlled manner over an emulated network layer for simulation/testing, or a single application in a concurrent environment for deployment purpose. We highlight modeling aspects that are of critical importance for designing and testing P2P applications, e.g. the emulation of Network Address Translation and bandwidth dynamics. We show how our simulator scales when emulating low-level bandwidth characteristics of thousands of concurrent peers while preserving a good degree of accuracy compared to a packet-level simulator

    Statistic Rate Monotonic Scheduling

    Full text link
    In this paper we present Statistical Rate Monotonic Scheduling (SRMS), a generalization of the classical RMS results of Liu and Layland that allows scheduling periodic tasks with highly variable execution times and statistical QoS requirements. Similar to RMS, SRMS has two components: a feasibility test and a scheduling algorithm. The feasibility test for SRMS ensures that using SRMS' scheduling algorithms, it is possible for a given periodic task set to share a given resource (e.g. a processor, communication medium, switching device, etc.) in such a way that such sharing does not result in the violation of any of the periodic tasks QoS constraints. The SRMS scheduling algorithm incorporates a number of unique features. First, it allows for fixed priority scheduling that keeps the tasks' value (or importance) independent of their periods. Second, it allows for job admission control, which allows the rejection of jobs that are not guaranteed to finish by their deadlines as soon as they are released, thus enabling the system to take necessary compensating actions. Also, admission control allows the preservation of resources since no time is spent on jobs that will miss their deadlines anyway. Third, SRMS integrates reservation-based and best-effort resource scheduling seamlessly. Reservation-based scheduling ensures the delivery of the minimal requested QoS; best-effort scheduling ensures that unused, reserved bandwidth is not wasted, but rather used to improve QoS further. Fourth, SRMS allows a system to deal gracefully with overload conditions by ensuring a fair deterioration in QoS across all tasks---as opposed to penalizing tasks with longer periods, for example. Finally, SRMS has the added advantage that its schedulability test is simple and its scheduling algorithm has a constant overhead in the sense that the complexity of the scheduler is not dependent on the number of the tasks in the system. We have evaluated SRMS against a number of alternative scheduling algorithms suggested in the literature (e.g. RMS and slack stealing), as well as refinements thereof, which we describe in this paper. Consistently throughout our experiments, SRMS provided the best performance. In addition, to evaluate the optimality of SRMS, we have compared it to an inefficient, yet optimal scheduler for task sets with harmonic periods.National Science Foundation (CCR-970668
    • …
    corecore