102 research outputs found

    Static allocation of computation to processors in multicomputers

    Get PDF

    Multiprocessor scheduling with practical constraints

    Get PDF
    The problem of scheduling tasks onto multiprocessor systems has increasing practical importance as more applications are being addressed with multiprocessor systems. Actual applications and multiprocessor systems have many characteristics which become constraints to the general scheduling problem of minimizing the schedule length. These practical constraints include precedence relations and communication delays between tasks, yet few researchers have considered both these constraints when developing schedulers. This work examines a more general multiprocessor scheduling problem, which includes these practical scheduling constraints, and develops a new scheduling heuristic using a list scheduler with dynamically computed priorities. The dynamic priority heuristic is compared against an optimal scheduler and against other researchers’ approaches for thousands of randomly generated scheduling problems. The dynamic priority heuristic produces schedules with lengths which are 10% to 20% over optimal on the average. The dynamic priority heuristic performs better than other researchers’ approaches for scheduling problems with the practical constraints. We conclude that it is important to consider practical constraints in the design of a scheduler and that a simple heuristic can still achieve good performance in this area

    Scheduling with Communication Delays

    Get PDF
    International audiencehanbook on ordonnancemen

    Improved Mixed-Integer Programming Models for Multiprocessor Scheduling with Communication Delays

    Get PDF
    We revise existing and introduce new mixed-integer programming models for the Multiprocessor Scheduling Problem with Communication Delays. At first, we show how to provably reduce the number of product variables necessary to explicitly linearize the so-called packing formulation that contains bilinear terms. Then, we reveal that the feasible region of almost all existing formulations contains redundant solutions and formulate new constraints in order to exclude these. At the same time, by exploiting further structural properties, the models are improved concerning their size, strength, and modeling complexity. The discussion of these improvements leads to new much more compact formulations which are then experimentally compared with each other and with other formulations from the literature. We set up a realistic scenario with a preprocessing of the task graphs, delivering the gained information equally to all the tested models and evaluate not only running times but also the obtained lower and upper bounds on the makespan objective for unsolved instances of a large scale benchmark set

    Interconnect-aware scheduling and resource allocation for high-level synthesis

    Get PDF
    A high-level architectural synthesis can be described as the process of transforming a behavioral description into a structural description. The scheduling, processor allocation, and register binding are the most important tasks in the high-level synthesis. In the past, it has been possible to focus simply on the delays of the processing units in a high-level synthesis and neglect the wire delays, since the overall delay of a digital system was dominated by the delay of the logic gates. However, with the process technology being scaled down to deep-submicron region, the global interconnect delays can no longer be neglected in VLSI designs. It is, therefore, imperative to include in high-level synthesis the delays on wires and buses used to communicate data between the processing units i.e., inter-processor communication delays. Furthermore, the way the process of register binding is performed also has an impact on the complexity of the interconnect paths required to transfer data between the processing units. Hence, the register binding can no longer ignore its effect on the wiring complexity of resulting designs. The objective of this thesis is to develop techniques for an interconnect-aware high-level synthesis. Under this common theme, this thesis has two distinct focuses. The first focus of this thesis is on developing a new high-level synthesis framework while taking the inter-processor communication delay into consideration. The second focus of this thesis is on the developing of a technique to carry out the register binding and a scheme to reduce the number of registers while taking the complexity of the interconnects into consideration. A novel scheduling and processor allocation technique taking into consideration the inter-processor communication delay is presented. In the proposed technique, the communication delay between a pair of nodes of different types is treated as a non-computing node, whereas that between a pair of nodes of the same type is taken into account by re-adjusting the firing times of the appropriate nodes of the data flow graph (DFG). Another technique for the integration of the placement process into the scheduling and processor allocation in order to determine the actual positions of the processing units in the placement space is developed. The proposed technique makes use of a hybrid library of functional units, which includes both operation-specific and reconfigurable multiple-operation functional units, to maximize the local data transfer. A technique for register binding that results in a reduced number of registers and interconnects is developed by appropriately dividing the lifetime of a token into multiple segments and then binding those having the same source and/or destination into a single register. A node regeneration scheme, in which the idle processing units are utilized to generate multiple copies of the nodes in a given DFG, is devised to reduce the number of registers and interconnects even further. The techniques and schemes developed in this thesis are applied to the synthesis of architectures for a number of benchmark DSP problems and compared with various other commonly used synthesis methods in order to assess their effectiveness. It is shown that the proposed techniques provide superior performance in terms of the iteration period, placement area, and the numbers of the processing units, registers and interconnects in the synthesized architectur

    Ordonnancement hybride des applications flots de données sur des systèmes embarqués multi-coeurs

    Get PDF
    Les systèmes embarqués sont de plus en plus présents dans l'industrie comme dans la vie quotidienne. Une grande partie de ces systèmes comprend des applications effectuant du traitement intensif des données: elles utilisent de nombreux filtres numériques, où les opérations sur les données sont répétitives et ont un contrôle limité. Les graphes "flots de données", grâce à leur déterminisme fonctionnel inhérent, sont très répandus pour modéliser les systèmes embarqués connus sous le nom de "data-driven". L'ordonnancement statique et périodique des graphes flot de données a été largement étudié, surtout pour deux modèles particuliers: SDF et CSDF. Dans cette thèse, on s'intéresse plus particulièrement à l'ordonnancement périodique des graphes CSDF. Le problème consiste à identifier des séquences périodiques infinies d'actionnement des acteurs qui aboutissent à des exécutions complètes à buffers bornés. L'objectif est de pouvoir aborder ce problème sous des angles différents : maximisation de débit, minimisation de la latence et minimisation de la capacité des buffers. La plupart des travaux existants proposent des solutions pour l'optimisation du débit et négligent le problème d'optimisation de la latence et propose même dans certains cas des ordonnancements qui ont un impact négatif sur elle afin de conserver les propriétés de périodicité. On propose dans cette thèse un ordonnancement hybride, nommé Self-Timed Périodique (STP), qui peut conserver les propriétés d'un ordonnancement périodique et à la fois améliorer considérablement sa performance en terme de latence.One of the most important aspects of parallel computing is its close relation to the underlying hardware and programming models. In this PhD thesis, we take dataflow as the basic model of computation, as it fits the streaming application domain. Cyclo-Static Dataflow (CSDF) is particularly interesting because this variant is one of the most expressive dataflow models while still being analyzable at design time. Describing the system at higher levels of abstraction is not sufficient, e.g. dataflow have no direct means to optimize communication channels generally based on shared buffers. Therefore, we need to link the dataflow MoCs used for performance analysis of the programs, the real time task models used for timing analysis and the low-level model used to derive communication times. This thesis proposes a design flow that meets these challenges, while enabling features such as temporal isolation and taking into account other challenges such as predictability and ease of validation. To this end, we propose a new scheduling policy noted Self-Timed Periodic (STP), which is an execution model combining Self-Timed Scheduling (STS) with periodic scheduling. In STP scheduling, actors are no longer strictly periodic but self-timed assigned to periodic levels: the period of each actor under periodic scheduling is replaced by its worst-case execution time. Then, STP retains some of the performance and flexibility of self-timed schedule, in which execution times of actors need only be estimates, and at the same time makes use of the fact that with a periodic schedule we can derive a tight estimation of the required performance metrics

    Scheduling Under Non-Uniform Job and Machine Delays

    Get PDF

    Self-Timed Periodic Scheduling For Cyclo-Static DataFlow Model

    Get PDF
    International audienceReal-time and time-constrained applications programmed on many-core systems can suffer from unmet timing constraints even with correct-by-construction schedules. Such unexpected results are usually caused by unaccounted for delays due to resource sharing (e.g. the communication medium). In this paper we address the three main sources of unpredictable behaviors: First, we propose to use a deterministic Model of Computation (MoC), more specifically, the well-formed CSDF subset of process networks; Second, we propose a run-time management strategy of shared resources to avoid unpredictable timings; Third, we promote the use of a new scheduling policy, the so-said Self-Timed Periodic (STP) scheduling, to improve performance and decrease synchronization costs by taking into account resource sharing or resource constraints. This is a quantitative improvement above state-of-the-art scheduling policies which assumed fixed delays of inter-processor communication and did not take correctly into account subtle effects of synchronization

    Parallel solution of power system linear equations

    Get PDF
    At the heart of many power system computations lies the solution of a large sparse set of linear equations. These equations arise from the modelling of the network and are the cause of a computational bottleneck in power system analysis applications. Efficient sequential techniques have been developed to solve these equations but the solution is still too slow for applications such as real-time dynamic simulation and on-line security analysis. Parallel computing techniques have been explored in the attempt to find faster solutions but the methods developed to date have not efficiently exploited the full power of parallel processing. This thesis considers the solution of the linear network equations encountered in power system computations. Based on the insight provided by the elimination tree, it is proposed that a novel matrix structure is adopted to allow the exploitation of parallelism which exists within the cutset of a typical parallel solution. Using this matrix structure it is possible to reduce the size of the sequential part of the problem and to increase the speed and efficiency of typical LU-based parallel solution. A method for transforming the admittance matrix into the required form is presented along with network partitioning and load balancing techniques. Sequential solution techniques are considered and existing parallel methods are surveyed to determine their strengths and weaknesses. Combining the benefits of existing solutions with the new matrix structure allows an improved LU-based parallel solution to be derived. A simulation of the improved LU solution is used to show the improvements in performance over a standard LU-based solution that result from the adoption of the new techniques. The results of a multiprocessor implementation of the method are presented and the new method is shown to have a better performance than existing methods for distributed memory multiprocessors

    Scheduling with Communication Delay in Near-Linear Time

    Get PDF
    We consider the problem of efficiently scheduling jobs with precedence constraints on a set of identical machines in the presence of a uniform communication delay. Such precedence-constrained jobs can be modeled as a directed acyclic graph, G = (V, E). In this setting, if two precedence-constrained jobs u and v, with v dependent on u (u ? v), are scheduled on different machines, then v must start at least ? time units after u completes. The scheduling objective is to minimize makespan, i.e. the total time from when the first job starts to when the last job finishes. The focus of this paper is to provide an efficient approximation algorithm with near-linear running time. We build on the algorithm of Lepere and Rapine [STACS 2002] for this problem to give an O((ln ?)/(ln ln ?))-approximation algorithm that runs in O?(|V|+|E|) time
    • …
    corecore