86,073 research outputs found
Data Structures for Task-based Priority Scheduling
Many task-parallel applications can benefit from attempting to execute tasks
in a specific order, as for instance indicated by priorities associated with
the tasks. We present three lock-free data structures for priority scheduling
with different trade-offs on scalability and ordering guarantees. First we
propose a basic extension to work-stealing that provides good scalability, but
cannot provide any guarantees for task-ordering in-between threads. Next, we
present a centralized priority data structure based on -fifo queues, which
provides strong (but still relaxed with regard to a sequential specification)
guarantees. The parameter allows to dynamically configure the trade-off
between scalability and the required ordering guarantee. Third, and finally, we
combine both data structures into a hybrid, -priority data structure, which
provides scalability similar to the work-stealing based approach for larger
, while giving strong ordering guarantees for smaller . We argue for
using the hybrid data structure as the best compromise for generic,
priority-based task-scheduling.
We analyze the behavior and trade-offs of our data structures in the context
of a simple parallelization of Dijkstra's single-source shortest path
algorithm. Our theoretical analysis and simulations show that both the
centralized and the hybrid -priority based data structures can give strong
guarantees on the useful work performed by the parallel Dijkstra algorithm. We
support our results with experimental evidence on an 80-core Intel Xeon system
Schedulability analysis of timed CSP models using the PAT model checker
Timed CSP can be used to model and analyse real-time and concurrent behaviour of embedded control systems. Practical CSP implementations combine the CSP model of a real-time control system with prioritized scheduling to achieve efficient and orderly use of limited resources. Schedulability analysis of a timed CSP model of a system with respect to a scheduling scheme and a particular execution platform is important to ensure that the system design satisfies its timing requirements. In this paper, we propose a framework to analyse schedulability of CSP-based designs for non-preemptive fixed-priority multiprocessor scheduling. The framework is based on the PAT model checker and the analysis is done with dense-time model checking on timed CSP models. We also provide a schedulability analysis workflow to construct and analyse, using the proposed framework, a timed CSP model with scheduling from an initial untimed CSP model without scheduling. We demonstrate our schedulability analysis workflow on a case study of control software design for a mobile robot. The proposed approach provides non-pessimistic schedulability results
Extending snBench to Support Hierarchical and Configurable Scheduling
It is useful in systems that must support multiple applications with various temporal requirements to allow application-specific policies to manage resources accordingly. However, there is a tension between this goal and the desire to control and police possibly malicious programs. The Java-based Sensor Execution Environment (SXE) in snBench presents a situation where such considerations add value to the system. Multiple applications can be run by multiple users with varied temporal requirements, some Real-Time and others best effort. This paper outlines and documents an implementation of a hierarchical and configurable scheduling system with which different applications can be executed using application-specific scheduling policies. Concurrently the system administrator can define fairness policies between applications that are imposed upon the system. Additionally, to ensure forward progress of system execution in the face of malicious or malformed user programs, an infrastructure for execution using multiple threads is described
Architectural support for task dependence management with flexible software scheduling
The growing complexity of multi-core architectures has motivated a wide range of software mechanisms to improve the orchestration of parallel executions. Task parallelism has become a very attractive approach thanks to its programmability, portability and potential for optimizations. However, with the expected increase in core counts, finer-grained tasking will be required to exploit the available parallelism, which will increase the overheads introduced by the runtime system. This work presents Task Dependence Manager (TDM), a hardware/software co-designed mechanism to mitigate runtime system overheads. TDM introduces a hardware unit, denoted Dependence Management Unit (DMU), and minimal ISA extensions that allow the runtime system to offload costly dependence tracking operations to the DMU and to still perform task scheduling in software. With lower hardware cost, TDM outperforms hardware-based solutions and enhances the flexibility, adaptability and composability of the system. Results show that TDM improves performance by 12.3% and reduces EDP by 20.4% on average with respect to a software runtime system. Compared to a runtime system fully implemented in hardware, TDM achieves an average speedup of 4.2% with 7.3x less area requirements and significant EDP reductions. In addition, five different software schedulers are evaluated with TDM, illustrating its flexibility and performance gains.This work has been supported by the RoMoL ERC Advanced Grant (GA 321253), by the European HiPEAC Network of Excellence, by the Spanish Ministry of Science and
Innovation (contracts TIN2015-65316-P, TIN2016-76635-C2-2-R and TIN2016-81840-REDT), by the Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), and by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 671697 and No. 671610. M. Moretó has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship number JCI-2012-15047.Peer ReviewedPostprint (author's final draft
Configurable Strategies for Work-stealing
Work-stealing systems are typically oblivious to the nature of the tasks they
are scheduling. For instance, they do not know or take into account how long a
task will take to execute or how many subtasks it will spawn. Moreover, the
actual task execution order is typically determined by the underlying task
storage data structure, and cannot be changed. There are thus possibilities for
optimizing task parallel executions by providing information on specific tasks
and their preferred execution order to the scheduling system.
We introduce scheduling strategies to enable applications to dynamically
provide hints to the task-scheduling system on the nature of specific tasks.
Scheduling strategies can be used to independently control both local task
execution order as well as steal order. In contrast to conventional scheduling
policies that are normally global in scope, strategies allow the scheduler to
apply optimizations on individual tasks. This flexibility greatly improves
composability as it allows the scheduler to apply different, specific
scheduling choices for different parts of applications simultaneously. We
present a number of benchmarks that highlight diverse, beneficial effects that
can be achieved with scheduling strategies. Some benchmarks (branch-and-bound,
single-source shortest path) show that prioritization of tasks can reduce the
total amount of work compared to standard work-stealing execution order. For
other benchmarks (triangle strip generation) qualitatively better results can
be achieved in shorter time. Other optimizations, such as dynamic merging of
tasks or stealing of half the work, instead of half the tasks, are also shown
to improve performance. Composability is demonstrated by examples that combine
different strategies, both within the same kernel (prefix sum) as well as when
scheduling multiple kernels (prefix sum and unbalanced tree search)
uRT51: An Embedded Real-Time processor implemented on FPGA devices
In this paper we describe and evaluate the main features of the uRT51 processor. The uRT51 processor was designed for embedded realtime control applications. It is a processor architecture that incorporates the specific functions of a real-time system in hardware. It was described using synthesizable VHDL and it was implemented on FPGA devices. We describe how the uRT51 processor supports time, events, task and priorities. The performance of the uRT51 processor is evaluated using a control application as a case study. The experiments show that the uRT51 processor scheduling features outperform the ones obtained using a traditional RTOS-based real-time system.Fil: Cayssials, Ricardo Luis. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Ingeniería Eléctrica y de Computadoras; ArgentinaFil: Duval, M,. Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones Científicas; Argentina. Universidad Nacional del Sur. Departamento de Ingeniería Eléctrica y de Computadoras; ArgentinaFil: Ferro, Edgardo Carlos. Universidad Nacional del Sur. Departamento de Ingeniería Eléctrica y de Computadoras; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; ArgentinaFil: Alimenti, O.. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Ingeniería Eléctrica y de Computadoras; Argentin
- …