3,120 research outputs found
Work Stealing Simulator
We present in this paper a Work Stealing lightweight PYTHON simulator. Our simulator is used to execute an application (list of tasks with or without dependencies), on a multiple processors platform linked by specific topology. We first give an overview of the different variants of the work stealing algorithm, then we present the architecture of our light Work Stealing simulator. Its architecture facilitates the development of other types of applications and other topologies for interconnecting the processors. We present the use cases of the simulator and the different types of results
Remote-scope Promotion: Clarified, Rectified, and Verified
Modern accelerator programming frameworks, such as OpenCL, organise threads into work-groups. Remote-scope promotion (RSP) is a language extension recently proposed by AMD researchers that is designed to enable applications, for the first time, both to optimise for the common case of intra-work-group communication (using memory scopes to provide consistency only within a work-group) and to allow occasional inter-work-group communication (as required, for instance, to support the popular load-balancing idiom of work stealing). We present the first formal, axiomatic memory model of OpenCL extended with RSP. We have extended the Herd memory model simulator with support for OpenCL kernels that exploit RSP, and used it to discover bugs in several litmus tests and a work-stealing queue, that have been used previously in the study of RSP. We have also formalised the proposed GPU implementation of RSP. The formalisation process allowed us to identify bugs in the description of RSP that could result in well-synchronised programs experiencing memory inconsistencies. We present and prove sound a new implementation of RSP that incorporates bug fixes and requires less non-standard hardware than the original implementation. This work, a collaboration between academia and industry, clearly demonstrates how, when designing hardware support for a new concurrent language feature, the early application of formal tools and techniques can help to prevent errors, such as those we have found, from making it into silicon
Data Structures for Task-based Priority Scheduling
Many task-parallel applications can benefit from attempting to execute tasks
in a specific order, as for instance indicated by priorities associated with
the tasks. We present three lock-free data structures for priority scheduling
with different trade-offs on scalability and ordering guarantees. First we
propose a basic extension to work-stealing that provides good scalability, but
cannot provide any guarantees for task-ordering in-between threads. Next, we
present a centralized priority data structure based on -fifo queues, which
provides strong (but still relaxed with regard to a sequential specification)
guarantees. The parameter allows to dynamically configure the trade-off
between scalability and the required ordering guarantee. Third, and finally, we
combine both data structures into a hybrid, -priority data structure, which
provides scalability similar to the work-stealing based approach for larger
, while giving strong ordering guarantees for smaller . We argue for
using the hybrid data structure as the best compromise for generic,
priority-based task-scheduling.
We analyze the behavior and trade-offs of our data structures in the context
of a simple parallelization of Dijkstra's single-source shortest path
algorithm. Our theoretical analysis and simulations show that both the
centralized and the hybrid -priority based data structures can give strong
guarantees on the useful work performed by the parallel Dijkstra algorithm. We
support our results with experimental evidence on an 80-core Intel Xeon system
Mesmerizer: A Effective Tool for a Complete Peer-to-Peer Software Development Life-cycle
In this paper we present what are, in our experience, the best
practices in Peer-To-Peer(P2P) application development and
how we combined them in a middleware platform called Mesmerizer. We explain how simulation is an integral part of
the development process and not just an assessment tool.
We then present our component-based event-driven framework for P2P application development, which can be used
to execute multiple instances of the same application in a
strictly controlled manner over an emulated network layer
for simulation/testing, or a single application in a concurrent
environment for deployment purpose. We highlight modeling aspects that are of critical importance for designing and
testing P2P applications, e.g. the emulation of Network Address Translation and bandwidth dynamics. We show how
our simulator scales when emulating low-level bandwidth
characteristics of thousands of concurrent peers while preserving a good degree of accuracy compared to a packet-level
simulator
Statistic Rate Monotonic Scheduling
In this paper we present Statistical Rate Monotonic Scheduling (SRMS), a generalization of the classical RMS results of Liu and Layland that allows scheduling periodic tasks with highly variable execution times and statistical QoS requirements. Similar to RMS, SRMS has two components: a feasibility test and a scheduling algorithm. The feasibility test for SRMS ensures that using SRMS' scheduling algorithms, it is possible for a given periodic task set to share a given resource (e.g. a processor, communication medium, switching device, etc.) in such a way that such sharing does not result in the violation of any of the periodic tasks QoS constraints.
The SRMS scheduling algorithm incorporates a number of unique features. First, it allows for fixed priority scheduling that keeps the tasks' value (or importance) independent of their periods. Second, it allows for job admission control, which allows the rejection of jobs that are not guaranteed to finish by their deadlines as soon as they are released, thus enabling the system to take necessary compensating actions. Also, admission control allows the preservation of resources since no time is spent on jobs that will miss their deadlines anyway. Third, SRMS integrates reservation-based and best-effort resource scheduling seamlessly. Reservation-based scheduling ensures the delivery of the minimal requested QoS; best-effort scheduling ensures that unused, reserved bandwidth is not wasted, but rather used to improve QoS further. Fourth, SRMS allows a system to deal gracefully with overload conditions by ensuring a fair deterioration in QoS across all tasks---as opposed to penalizing tasks with longer periods, for example. Finally, SRMS has the added advantage that its schedulability test is simple and its scheduling algorithm has a constant overhead in the sense that the complexity of the scheduler is not dependent on the number of the tasks in the system.
We have evaluated SRMS against a number of alternative scheduling algorithms suggested in the literature (e.g. RMS and slack stealing), as well as refinements thereof, which we describe in this paper. Consistently throughout our experiments, SRMS provided the best performance. In addition, to evaluate the optimality of SRMS, we have compared it to an inefficient, yet optimal scheduler for task sets with harmonic periods.National Science Foundation (CCR-970668
- …