Search CORE

15 research outputs found

Multi-criteria scheduling of pipeline workflows

Author: Benoit Anne
Rehn-Sonigo Veronika
Robert Yves
Publication venue
Publication date: 01/01/2007
Field of study

Mapping workflow applications onto parallel platforms is a challenging problem, even for simple application patterns such as pipeline graphs. Several antagonist criteria should be optimized, such as throughput and latency (or a combination). In this paper, we study the complexity of the bi-criteria mapping problem for pipeline graphs on communication homogeneous platforms. In particular, we assess the complexity of the well-known chains-to-chains problem for different-speed processors, which turns out to be NP-hard. We provide several efficient polynomial bi-criteria heuristics, and their relative performance is evaluated through extensive simulations

arXiv.org e-Print Archive

HAL-ENS-LYON

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

A Survey of Pipelined Workflow Scheduling: Models and Algorithms

Author: Benoit Anne
Catalyurek Umit,
Robert Yves
Saule Erik
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

International audienceA large class of applications need to execute the same workflow on different data sets of identical size. Efficient execution of such applications necessitates intelligent distribution of the application components and tasks on a parallel machine, and the execution can be orchestrated by utilizing task-, data-, pipelined-, and/or replicated-parallelism. The scheduling problem that encompasses all of these techniques is called pipelined workflow scheduling, and it has been widely studied in the last decade. Multiple models and algorithms have flourished to tackle various programming paradigms, constraints, machine behaviors or optimization goals. This paper surveys the field by summing up and structuring known results and approaches

HAL-ENS-LYON

CiteSeerX

INRIA a CCSD electronic archive server

Hal-Diderot

Sparse matrix decomposition with optimal load balancing

Author: Pinar Ali Aykanat Cevdet
Publication venue: IEEE, Piscataway, NJ, United States
Publication date: 01/01/1997
Field of study

Optimal load balancing in sparse matrix decomposition without disturbing the row/column ordering is investigated. Both asymptotically and run-time efficient exact algorithms are proposed and implemented for one-dimensional (1D) striping and two-dimensional (2D) jagged partitioning. Binary search method is successfully adopted to 1D striped decomposition by deriving and exploiting a good upper bound on the value of an optimal solution. A binary search algorithm is proposed for 2D jagged partitioning by introducing a new 2D probing scheme. A new iterative-refinement scheme is proposed for both 1D and 2D partitioning. Proposed algorithms are also space efficient since they only need the conventional compressed storage scheme for the given matrix, avoiding the need for a dense workload matrix in 2D decomposition. Experimental results on a wide set of test matrices show that considerably better decompositions can be obtained by using optimal load balancing algorithms instead of heuristics. Proposed algorithms are 100 times faster than a single sparse-matrix vector multiplication (SpMxV), in the 64-way 1D decompositions, on the overall average. Our jagged partitioning algorithms are only 60% slower than a single SpMxV computation in the 8×8-way 2D decompositions, on the overall average

Bilkent University Institutional Repository

Task allocation in distributed multimedia systems based on the host-satellite model

Author: Dermler Gabriel
Iqbal Ashraf
Publication venue
Publication date: 26/06/2013
Field of study

Multimedia applications require intermediate processing between media sources and sinks. In addition to end-user machines intermediate computers can be used for performing media processing. This possibility leads to the problem of allocating processing components on various computers. In this paper, we study this problem in the context of star-shaped application graphs which have to be allocated between given end-user machines (satellites) and a central computer (host). The problem is formulated in terms of best achievable bottleneck resource usage. Several approaches are considered including anapproximate scheme and two fast-heuristics. Performance measurements show the efficiency of the considered approaches. A discussion of our approach shows important differences to solutions provided for related problems of graph partitioning and mapping

Pipeline Task Scheduling with Appication to Network Processors

Author: Datar Seema
Publication venue: Washington University Open Scholarship
Publication date: 04/08/2004
Field of study

Chip Multi-Processors (CMPs) are now available in a variety of systems and provide the opportunity for achieving high computational performance by exploiting application-level parallelism. In the communications environment, network processors (NPs), designed around CMP architectures, are generally usable in a pipelined manner. This leads to the need for static scheduling of tasks on processor pipelines. This thesis considers problems associated with determining optimal schedules for such pipelines. A collection of algorithms is presented with their utility determined by the size and other characteristics of the system. The algorithms employ heuristics, dynamic programming and statistical methods to schedule tasks derived from multiple application ﬂows on pipelines with an arbitrary number of stages. Experimental results indicate that while the dynamic programming algorithm obtains the optimal schedules, heuristics and statistical methods obtain schedules within 10% of the optimal, 95% of the time. Examples are given to show the use of these algorithms for general pipeline/algorithm design and for use in the Network Processor environment with typical networking applications

Washington University St. Louis: Open Scholarship

Streaming Partitioning of Sequences and Trees

Author: Konrad Christian
Publication venue
Publication date: 01/01/2016
Field of study

We study streaming algorithms for partitioning integer sequences and trees. In the case of trees, we suppose that the input tree is provided by a stream consisting of a depth-first-traversal of the input tree. This captures the problem of partitioning XML streams, among other problems. We show that both problems admit deterministic (1+epsilon)-approximation streaming algorithms, where a single pass is sufficient for integer sequences and two passes are required for trees. The space complexity for partitioning integer sequences is O((1/epsilon) * p * log(nm)) and for partitioning trees is O((1/epsilon) * p^2 * log(nm)), where n is the length of the input stream, m is the maximal weight of an element in the stream, and p is the number of partitions to be created. Furthermore, for the problem of partitioning integer sequences, we show that computing an optimal solution in one pass requires Omega(n) space, and computing a (1+epsilon)-approximation in one pass requires Omega((1/epsilon) * log(n)) space, rendering our algorithm tight for instances with p,m in O(1)

Dagstuhl Research Online Publication Server

Explore Bristol Research

Locality-Aware Concurrency Platforms

Author: Maglalang Jordyn Chrystopher Raymond
Publication venue: Washington University Open Scholarship
Publication date: 15/12/2017
Field of study

Modern computing systems from all domains are becoming increasingly more parallel. Manufacturers are taking advantage of the increasing number of available transistors by packaging more and more computing resources together on a single chip or within a single system. These platforms generally contain many levels of private and shared caches in addition to physically distributed main memory. Therefore, some memory is more expensive to access than other and high-performance software must consider memory locality as one of the first level considerations. Memory locality is often difficult for application developers to consider directly, however, since many of these NUMA affects are invisible to the application programmer and only show up in low performance. Moreover, on parallel platforms, the performance depends on both locality and load balance and these two metrics are often at odds with each other. Therefore, directly considering locality and load balance at the application level may make the application much more complex to program. In this work, we develop locality-conscious concurrency platforms for multiple different structured parallel programming models, including streaming applications, task-graphs and parallel for loops. In all of this work, the idea is to minimally disrupt the application programming model so that the application developer is either unimpacted or must only provide high-level hints to the runtime system. The runtime system then schedules the application to provide good locality of access while, at the same time also providing good load balance. In particular, we address cache locality for streaming applications through static partitioning and developed an extensible platform to execute partitioned streaming applications. For task-graphs, we extend a task-graph scheduling library to guide scheduling decisions towards better NUMA locality with the help of user-provided locality hints. CilkPlus parallel for loops utilize a randomized dynamic scheduler to distribute work which, in many loop based applications, results in poor locality at all levels of the memory hierarchy. We address this issue with a novel parallel for loop implementation that can get good cache and NUMA locality while providing support to maintain good load balance dynamically

Washington University St. Louis: Open Scholarship

Recommended from our members

LDRD report : parallel repartitioning for optimal solver performance.

Author: Boman Erik Gunnar
Devine Karen Dragon
Heaphy Robert
Hendrickson Bruce Alan
Heroux Michael Allen
Preis Robert (University of Paderborn, Paderborn, Germany)
Publication venue: Sandia National Laboratories
Publication date: 01/02/2004
Field of study

We have developed infrastructure, utilities and partitioning methods to improve data partitioning in linear solvers and preconditioners. Our efforts included incorporation of data repartitioning capabilities from the Zoltan toolkit into the Trilinos solver framework, (allowing dynamic repartitioning of Trilinos matrices); implementation of efficient distributed data directories and unstructured communication utilities in Zoltan and Trilinos; development of a new multi-constraint geometric partitioning algorithm (which can generate one decomposition that is good with respect to multiple criteria); and research into hypergraph partitioning algorithms (which provide up to 56% reduction of communication volume compared to graph partitioning for a number of emerging applications). This report includes descriptions of the infrastructure and algorithms developed, along with results demonstrating the effectiveness of our approaches

UNT Digital Library

LDRD report : parallel repartitioning for optimal solver performance.

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref