308 research outputs found

    A Survey of Pipelined Workflow Scheduling: Models and Algorithms

    Get PDF
    International audienceA large class of applications need to execute the same workflow on different data sets of identical size. Efficient execution of such applications necessitates intelligent distribution of the application components and tasks on a parallel machine, and the execution can be orchestrated by utilizing task-, data-, pipelined-, and/or replicated-parallelism. The scheduling problem that encompasses all of these techniques is called pipelined workflow scheduling, and it has been widely studied in the last decade. Multiple models and algorithms have flourished to tackle various programming paradigms, constraints, machine behaviors or optimization goals. This paper surveys the field by summing up and structuring known results and approaches

    Design of testbed and emulation tools

    Get PDF
    The research summarized was concerned with the design of testbed and emulation tools suitable to assist in projecting, with reasonable accuracy, the expected performance of highly concurrent computing systems on large, complete applications. Such testbed and emulation tools are intended for the eventual use of those exploring new concurrent system architectures and organizations, either as users or as designers of such systems. While a range of alternatives was considered, a software based set of hierarchical tools was chosen to provide maximum flexibility, to ease in moving to new computers as technology improves and to take advantage of the inherent reliability and availability of commercially available computing systems

    Optimal processor assignment for pipeline computations

    Get PDF
    The availability of large scale multitasked parallel architectures introduces the following processor assignment problem for pipelined computations. Given a set of tasks and their precedence constraints, along with their experimentally determined individual responses times for different processor sizes, find an assignment of processor to tasks. Two objectives are of interest: minimal response given a throughput requirement, and maximal throughput given a response time requirement. These assignment problems differ considerably from the classical mapping problem in which several tasks share a processor; instead, it is assumed that a large number of processors are to be assigned to a relatively small number of tasks. Efficient assignment algorithms were developed for different classes of task structures. For a p processor system and a series parallel precedence graph with n constituent tasks, an O(np2) algorithm is provided that finds the optimal assignment for the response time optimization problem; it was found that the assignment optimizing the constrained throughput in O(np2log p) time. Special cases of linear, independent, and tree graphs are also considered

    Simulated annealing based datapath synthesis

    Get PDF

    Constraint analysis for DSP code generation

    Get PDF
    +113hlm.;24c

    Multiprocessor scheduling with practical constraints

    Get PDF
    The problem of scheduling tasks onto multiprocessor systems has increasing practical importance as more applications are being addressed with multiprocessor systems. Actual applications and multiprocessor systems have many characteristics which become constraints to the general scheduling problem of minimizing the schedule length. These practical constraints include precedence relations and communication delays between tasks, yet few researchers have considered both these constraints when developing schedulers. This work examines a more general multiprocessor scheduling problem, which includes these practical scheduling constraints, and develops a new scheduling heuristic using a list scheduler with dynamically computed priorities. The dynamic priority heuristic is compared against an optimal scheduler and against other researchers’ approaches for thousands of randomly generated scheduling problems. The dynamic priority heuristic produces schedules with lengths which are 10% to 20% over optimal on the average. The dynamic priority heuristic performs better than other researchers’ approaches for scheduling problems with the practical constraints. We conclude that it is important to consider practical constraints in the design of a scheduler and that a simple heuristic can still achieve good performance in this area

    The assignment problem in distributed computing

    Get PDF
    This dissertation focuses on the problem of assigning the modules of a program to the processors in a distributed system with the goal of minimizing the overall cost of running the program. The cost depends on the execution times of the modules on the processors and on the cost of communication between modules. This module allocation problem arises in a variety of situations where one is interested in making optimum use of available computer resources. The general module allocation problem is intractable; however it becomes polynomially-solvable when the communication graph is restricted. In this dissertation, we restrict our attention to k-trees;As the first problem, we study parametric module allocation on partial k-trees. We allow the costs, both execution and communication, to vary linearly as functions of a real parameter t. We show that if the number of processors is fixed, the sequence of optimum assignments that are obtained, as t varies from zero to infinity, can be constructed in polynomial time. As an auxiliary result, we develop a linear-time algorithm to find a separator in a k-tree. We discuss the implications of our results for parametric versions of the weighted vertex cover, independent set, and 0-1 quadratic programming problems on partial k-trees;Next, we consider two variants of the assignment problem. The first problem is to find a minimum-cost assignment when one of the processors has a limited memory. The second is to find an assignment that minimizes the maximum processor load. We present exact dynamic programming algorithms for both problems, which lead to approximation schemes for the case where the communication graph is a partial k-tree. Faster algorithms are presented for trees with uniform costs. In contrast to these results, we show that, for arbitrary graphs, no fully polynomial time approximation schemes exist unless P = NP. Both dynamic programming algorithms have been implemented. The implementation details and our experimental results are presented

    Parcus: Energy-Aware and Robust Parallelization of AUTOSAR Legacy Applications

    Get PDF
    Embedded multicore processors are an attractive alternative to sophisticated single-core processors for the use in automobile electronic control units (ECUs), due to their expected higher performance and energy efficiency. Parallelization approaches for AUTOSAR legacy software exploit these benefits. Nevertheless, these approaches focus on extracting performance neglecting the system's worst-case sensor/actuator latency and energy consumption. This paper presents Parcus, an energy-and latency-aware parallelization technique that combines both runnable-and tasklevel parallelism. Parcus explicitly models the traversal of data from sensor to actuator through task instances, enabling to consider the latency imposed by parallelization techniques. The parallel schedule quality (PSQ) metric quantifies the success of the parallelization, for which it takes the latency and the processor frequency into account. We demonstrate the applicability of Parcus with an automotive case study. The results show that Parcus can fully utilize the processor's energy-saving potential.This research received funding from the EU FP7 no. 287519 (parMERASA), the ARTEMIS-JU no. 621429 (EMC2), and the German Federal Ministry of Education and Research.Peer ReviewedPostprint (author's final draft

    A Model-based Design Framework for Application-specific Heterogeneous Systems

    Get PDF
    The increasing heterogeneity of computing systems enables higher performance and power efficiency. However, these improvements come at the cost of increasing the overall complexity of designing such systems. These complexities include constructing implementations for various types of processors, setting up and configuring communication protocols, and efficiently scheduling the computational work. The process for developing such systems is iterative and time consuming, with no well-defined performance goal. Current performance estimation approaches use source code implementations that require experienced developers and time to produce. We present a framework to aid in the design of heterogeneous systems and the performance tuning of applications. Our framework supports system construction: integrating custom hardware accelerators with existing cores into processors, integrating processors into cohesive systems, and mapping computations to processors to achieve overall application performance and efficient hardware usage. It also facilitates effective design space exploration using processor models (for both existing and future processors) that do not require source code implementations to estimate performance. We evaluate our framework using a variety of applications and implement them in systems ranging from low power embedded systems-on-chip (SoC) to high performance systems consisting of commercial-off-the-shelf (COTS) components. We show how the design process is improved, reducing the number of design iterations and unnecessary source code development ultimately leading to higher performing efficient systems
    • …
    corecore