Search CORE

308 research outputs found

A Survey of Pipelined Workflow Scheduling: Models and Algorithms

Author: Benoit Anne
Catalyurek Umit,
Robert Yves
Saule Erik
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

International audienceA large class of applications need to execute the same workflow on different data sets of identical size. Efficient execution of such applications necessitates intelligent distribution of the application components and tasks on a parallel machine, and the execution can be orchestrated by utilizing task-, data-, pipelined-, and/or replicated-parallelism. The scheduling problem that encompasses all of these techniques is called pipelined workflow scheduling, and it has been widely studied in the last decade. Multiple models and algorithms have flourished to tackle various programming paradigms, constraints, machine behaviors or optimization goals. This paper surveys the field by summing up and structuring known results and approaches

HAL-ENS-LYON

CiteSeerX

INRIA a CCSD electronic archive server

Hal-Diderot

Design of testbed and emulation tools

Author: Flynn M. J.
Lundstrom S. F.
Publication venue
Publication date
Field of study

The research summarized was concerned with the design of testbed and emulation tools suitable to assist in projecting, with reasonable accuracy, the expected performance of highly concurrent computing systems on large, complete applications. Such testbed and emulation tools are intended for the eventual use of those exploring new concurrent system architectures and organizations, either as users or as designers of such systems. While a range of alternatives was considered, a software based set of hierarchical tools was chosen to provide maximum flexibility, to ease in moving to new computers as technology improves and to take advantage of the inherent reliability and availability of commercially available computing systems

NASA Technical Reports Server

Optimal processor assignment for pipeline computations

Author: Choudhury Alok N.
Narahari Bhagirath
Nicol David M.
Simha Rahul
Publication venue
Publication date: 01/01/1991
Field of study

The availability of large scale multitasked parallel architectures introduces the following processor assignment problem for pipelined computations. Given a set of tasks and their precedence constraints, along with their experimentally determined individual responses times for different processor sizes, find an assignment of processor to tasks. Two objectives are of interest: minimal response given a throughput requirement, and maximal throughput given a response time requirement. These assignment problems differ considerably from the classical mapping problem in which several tasks share a processor; instead, it is assumed that a large number of processors are to be assigned to a relatively small number of tasks. Efficient assignment algorithms were developed for different classes of task structures. For a p processor system and a series parallel precedence graph with n constituent tasks, an O(np2) algorithm is provided that finds the optimal assignment for the response time optimization problem; it was found that the assignment optimizing the constrained throughput in O(np2log p) time. Special cases of linear, independent, and tree graphs are also considered

NASA Technical Reports Server

Syracuse University Research Facility and Collaborative Environment

Simulated annealing based datapath synthesis

Author: Neil John Paul
Publication venue: The University of Edinburgh
Publication date: 01/01/1994
Field of study

Edinburgh Research Archive

Constraint analysis for DSP code generation

Author: Mesman B.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2001
Field of study

+113hlm.;24c

Repository TU/e

Pure OAI Repository

uilis.unsyiah.ac.id

Multiprocessor scheduling with practical constraints

Author: Donovan Kenneth Burton
Publication venue: University of Central Florida
Publication date: 01/01/1986
Field of study

The problem of scheduling tasks onto multiprocessor systems has increasing practical importance as more applications are being addressed with multiprocessor systems. Actual applications and multiprocessor systems have many characteristics which become constraints to the general scheduling problem of minimizing the schedule length. These practical constraints include precedence relations and communication delays between tasks, yet few researchers have considered both these constraints when developing schedulers. This work examines a more general multiprocessor scheduling problem, which includes these practical scheduling constraints, and develops a new scheduling heuristic using a list scheduler with dynamically computed priorities. The dynamic priority heuristic is compared against an optimal scheduler and against other researchers’ approaches for thousands of randomly generated scheduling problems. The dynamic priority heuristic produces schedules with lengths which are 10% to 20% over optimal on the average. The dynamic priority heuristic performs better than other researchers’ approaches for scheduling problems with the practical constraints. We conclude that it is important to consider practical constraints in the design of a scheduler and that a simple heuristic can still achieve good performance in this area

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

The assignment problem in distributed computing

Author: Medepalli Anand
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/1992
Field of study

This dissertation focuses on the problem of assigning the modules of a program to the processors in a distributed system with the goal of minimizing the overall cost of running the program. The cost depends on the execution times of the modules on the processors and on the cost of communication between modules. This module allocation problem arises in a variety of situations where one is interested in making optimum use of available computer resources. The general module allocation problem is intractable; however it becomes polynomially-solvable when the communication graph is restricted. In this dissertation, we restrict our attention to k-trees;As the first problem, we study parametric module allocation on partial k-trees. We allow the costs, both execution and communication, to vary linearly as functions of a real parameter t. We show that if the number of processors is fixed, the sequence of optimum assignments that are obtained, as t varies from zero to infinity, can be constructed in polynomial time. As an auxiliary result, we develop a linear-time algorithm to find a separator in a k-tree. We discuss the implications of our results for parametric versions of the weighted vertex cover, independent set, and 0-1 quadratic programming problems on partial k-trees;Next, we consider two variants of the assignment problem. The first problem is to find a minimum-cost assignment when one of the processors has a limited memory. The second is to find an assignment that minimizes the maximum processor load. We present exact dynamic programming algorithms for both problems, which lead to approximation schemes for the case where the communication graph is a partial k-tree. Faster algorithms are presented for trees with uniform costs. In contrast to these results, we show that, for arbitrary graphs, no fully polynomial time approximation schemes exist unless P = NP. Both dynamic programming algorithms have been implemented. The implementation details and our experimental results are presented

Digital Repository @ Iowa State University (ISU)

Parcus: Energy-Aware and Robust Parallelization of AUTOSAR Legacy Applications

Author: Böddeker Bert
Kehr Sebastian
Langen Dominik
Quiñones Eduardo
Schäfer Günter
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/06/2017
Field of study

Embedded multicore processors are an attractive alternative to sophisticated single-core processors for the use in automobile electronic control units (ECUs), due to their expected higher performance and energy efficiency. Parallelization approaches for AUTOSAR legacy software exploit these benefits. Nevertheless, these approaches focus on extracting performance neglecting the system's worst-case sensor/actuator latency and energy consumption. This paper presents Parcus, an energy-and latency-aware parallelization technique that combines both runnable-and tasklevel parallelism. Parcus explicitly models the traversal of data from sensor to actuator through task instances, enabling to consider the latency imposed by parallelization techniques. The parallel schedule quality (PSQ) metric quantifies the success of the parallelization, for which it takes the latency and the processor frequency into account. We demonstrate the applicability of Parcus with an automotive case study. The results show that Parcus can fully utilize the processor's energy-saving potential.This research received funding from the EU FP7 no. 287519 (parMERASA), the ARTEMIS-JU no. 621429 (EMC2), and the German Federal Ministry of Education and Research.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

A Model-based Design Framework for Application-specific Heterogeneous Systems

Author: Skalicky Samuel
Publication venue: RIT Scholar Works
Publication date: 01/05/2015
Field of study

The increasing heterogeneity of computing systems enables higher performance and power efficiency. However, these improvements come at the cost of increasing the overall complexity of designing such systems. These complexities include constructing implementations for various types of processors, setting up and configuring communication protocols, and efficiently scheduling the computational work. The process for developing such systems is iterative and time consuming, with no well-defined performance goal. Current performance estimation approaches use source code implementations that require experienced developers and time to produce. We present a framework to aid in the design of heterogeneous systems and the performance tuning of applications. Our framework supports system construction: integrating custom hardware accelerators with existing cores into processors, integrating processors into cohesive systems, and mapping computations to processors to achieve overall application performance and efficient hardware usage. It also facilitates effective design space exploration using processor models (for both existing and future processors) that do not require source code implementations to estimate performance. We evaluate our framework using a variety of applications and implement them in systems ranging from low power embedded systems-on-chip (SoC) to high performance systems consisting of commercial-off-the-shelf (COTS) components. We show how the design process is improved, reducing the number of design iterations and unnecessary source code development ultimately leading to higher performing efficient systems

RIT Scholar Works