3,426 research outputs found

    Bi-criteria Pipeline Mappings for Parallel Image Processing

    Get PDF
    Mapping workflow applications onto parallel platforms is a challenging problem, even for simple application patterns such as pipeline graphs. Several antagonistic criteria should be optimized, such as throughput and latency (or a combination). Typical applications include digital image processing, where images are processed in steady-state mode. In this paper, we study the mapping of a particular image processing application, the JPEG encoding. Mapping pipelined JPEG encoding onto parallel platforms is useful for instance for encoding Motion JPEG images. As the bi-criteria mapping problem is NP-complete, we concentrate on the evaluation and performance of polynomial heuristics

    A Survey of Pipelined Workflow Scheduling: Models and Algorithms

    Get PDF
    International audienceA large class of applications need to execute the same workflow on different data sets of identical size. Efficient execution of such applications necessitates intelligent distribution of the application components and tasks on a parallel machine, and the execution can be orchestrated by utilizing task-, data-, pipelined-, and/or replicated-parallelism. The scheduling problem that encompasses all of these techniques is called pipelined workflow scheduling, and it has been widely studied in the last decade. Multiple models and algorithms have flourished to tackle various programming paradigms, constraints, machine behaviors or optimization goals. This paper surveys the field by summing up and structuring known results and approaches

    Assessing the performance of energy-aware mappings

    Get PDF
    International audienceWe aim at mapping streaming applications that can be modeled by a series-parallel graph onto a 2-dimensional tiled chip multiprocessor (CMP) architecture. The objective of the mapping is to minimize the energy consumption, using dynamic voltage and frequency scaling (DVFS) techniques, while maintaining a given level of performance, reflected by the rate of processing the data streams. This mapping problem turns out to be NP-hard, and several heuristics are proposed. We assess their performance through comprehensive simulations using the StreamIt workflow suite and randomly generated series-parallel graphs, and various CMP grid sizes

    Parallelization of dynamic programming recurrences in computational biology

    Get PDF
    The rapid growth of biosequence databases over the last decade has led to a performance bottleneck in the applications analyzing them. In particular, over the last five years DNA sequencing capacity of next-generation sequencers has been doubling every six months as costs have plummeted. The data produced by these sequencers is overwhelming traditional compute systems. We believe that in the future compute performance, not sequencing, will become the bottleneck in advancing genome science. In this work, we investigate novel computing platforms to accelerate dynamic programming algorithms, which are popular in bioinformatics workloads. We study algorithm-specific hardware architectures that exploit fine-grained parallelism in dynamic programming kernels using field-programmable gate arrays: FPGAs). We advocate a high-level synthesis approach, using the recurrence equation abstraction to represent dynamic programming and polyhedral analysis to exploit parallelism. We suggest a novel technique within the polyhedral model to optimize for throughput by pipelining independent computations on an array. This design technique improves on the state of the art, which builds latency-optimal arrays. We also suggest a method to dynamically switch between a family of designs using FPGA reconfiguration to achieve a significant performance boost. We have used polyhedral methods to parallelize the Nussinov RNA folding algorithm to build a family of accelerators that can trade resources for parallelism and are between 15-130x faster than a modern dual core CPU implementation. A Zuker RNA folding accelerator we built on a single workstation with four Xilinx Virtex 4 FPGAs outperforms 198 3 GHz Intel Core 2 Duo processors. Furthermore, our design running on a single FPGA is an order of magnitude faster than competing implementations on similar-generation FPGAs and graphics processors. Our work is a step toward the goal of automated synthesis of hardware accelerators for dynamic programming algorithms

    Multi-kritäres Mapping und Scheduling von Workflow-Anwendungen auf heterogenen Plattformen

    Get PDF
    The results summarized in this thesis deal with the mapping and scheduling of workflow applications on heterogeneous platforms. In this context, we focus on three different types of streaming applications: * Replica placement in tree networks * In this kind of application, clients are issuing requests to some servers and the question is where to place replicas in the network such that all requests can be processed. We discuss and compare several policies to place replicas in tree networks, subject to server capacity, Quality of Service (QoS) and bandwidth constraints. The client requests are known beforehand, while the number and location of the servers have to be determined. The standard approach in the literature is to enforce that all requests of a client be served by the closest server in the tree. We introduce and study two new policies. One major contribution of this work is to assess the impact of these new policies on the total replication cost. Another important goal is to assess the impact of server heterogeneity, both from a theoretical and a practical perspective. We establish several new complexity results, and provide several efficient polynomial heuristics for NP-complete instances of the problem. * Pipeline workflow applications * We consider workflow applications that can be expressed as linear pipeline graphs. An example for this application type is digital image processing, where images are treated in steady-state mode. Several antagonist criteria should be optimized, such as throughput and latency (or a combination) as well as latency and reliability (i.e., the probability that the computation will be successful) of the application. While simple polynomial algorithms can be found for fully homogeneous platforms, the problem becomes NP-hard when tackling heterogeneous platforms. We present an integer linear programming formulation for this latter problem. Furthermore, we provide several efficient polynomial bi-criteria heuristics, whose relative performances are evaluated through extensive simulation. As a case-study, we provide simulations and MPI experimental results for the JPEG encoder application pipeline on a cluster of workstations. * Complex streaming applications * We consider the execution of applications structured as trees of operators, i.e., the application of one or several trees of operators in steady-state to multiple data objects that are continuously updated at various locations in a network. A first goal is to provide the user with a set of processors that should be bought or rented in order to ensure that the application achieves a minimum steady-state throughput, and with the objective of minimizing platform cost. We then extend our model to multiple applications: several concurrent applications are executed at the same time in a network, and one has to ensure that all applications can reach their application throughput. Another contribution of this work is to provide complexity results for different instances of the basic problem, as well as integer linear program formulations of various problem instances. The third contribution is the design of several polynomial-time heuristics, for both application models. One of the primary objectives of the heuristics for concurrent applications is to reuse intermediate results shared by multiple applications.In meiner Dissertation beschäftige ich mich mit dem Scheduling von Workflow-Anwendungen in heterogenen Plattformen. In diesem Zusammenhang konzentriere ich mich auf drei verschiene Anwendungstypen.: * Platzierung von Replikaten in Baumnetzwerken * Dieses erste Schedulingproblem behan-delt die Platzierung von Replikaten in Baumnetzwerken. Ein Beispiel hierfür ist die Platzierung von Replikaten in verteilten Datenbanksystemen, deren Verbindungsstruktur baumartig organi-siert ist. Die Platzierung soll dabei unter mehreren Constraints (Serverkapazitäten, sowie Dienstgüte und Bandbreitenbeschränkungen) durchgeführt werden. In diesem Anwendungstyp stellen Clients Anfragen an verschiedene Server. Diese Client-Anfragen sind im Voraus bekannt, während Anzahl und Platzierung der Server erst ermittelt werden müssen. Die in der Literatur gängige Strategie fordert, dass alle Anfragen eines Clients vom nächstgelegenen Server im Baum behandelt werden. Es werden zwei neue Verfahrensweisen vorgestellt und untersucht. Ein wichtiges Teilergebnis dieser Studie bewertet die Auswirkung der beiden neuen Strategien auf die globalen Replikationskosten. Ausserdem wird der Einfluss von Heterogenität aus theore-tischer und praktischer Sicht untersucht. Es werden verschiedene Komplexitätsergebnisse erar-beitet und mehrere effiziente Polynomialzeit-Heuristiken für NP-vollständige Instanzen des Problems vorgestellt. * Lineare Workflow-Anwendungen * Als nächstes werden Workflow-Anwendungen untersucht, die als lineare Graphen dargestellt werden können. Ein Beispiel dieses Applikationstyps ist die digitale Bildverarbeitung, in der Bilder mittels einer Pipeline verarbeitet werden. Es sollen ver-schie¬dene gegensätzliche Kriterien optimiert werden, wie zum Beispiel Durchsatz und Latenz-zeit, beziehungsweise eine Kombination der beiden, aber auch Latenzzeit und Ausfallsicherheit der Anwendung. Während für vollhomogene Plattformen polynomiale Algorithmen gefunden werden können, wird das Problem NP-hart, sobald heterogene Plattformen angestrebt werden. Diese Arbeit beinhaltet eine vollständige Komplexitätsanalyse. Für die bisher unbekannten polynomialen Varianten des Problems werden optimale Algorithmen vorgeschlagen. Ein ganz-zahliges lineares Programm für das bekannte „chains-on-chains“ Problem für heterogene Plattformen wird vorgestellt. Des weiteren werden verschiedene effiziente polynomiale bi-kritäre Heuristiken präsentiert, deren relative Effizienz durch umfangreiche Simulationen eruiert werden. Eine Fallstudie beschäftigt sich mit der JPEG-Encoder-Pipeline. Hierbei werden Simulationen und MPI-basierte Auswertungen auf einem Rechen-Cluster erstellt. * Komplexe Streaming-Anwendungen * Als letztes wird die Ausführung von Anwendungen, die als Operator-Bäume strukturiert sind, untersucht. Konkret bedeutet dies, dass ein oder mehrere Operator-Bäume in stationärem Zustand auf mannigfaltige Datenobjekte angewendet werden, welche fortlaufend an verschiedenen Stellen im Netzwerk aktualisiert werden. Ein erstes Ziel ist, dem Benutzer eine Gruppe von Rechnern vorzuschlagen, die gekauft oder gemietet werden sollen, so dass die Anwendung einen minimalen stationären Durchsatz erzielt und gleichzeitig Plattformkosten minimiert werden können. Anschließend wird das Modell auf mehrere Anwendungen erweitert: verschiedene nebenläufige Anwendungen werden zeitgleich in einem Netzwerk ausgeführt und es muss sichergestellt werden, dass alle Anwendungen ihren Durchsatz erreichen können. Beide Modelle werden aus theoretischer Sicht untersucht und eine Komplexitäts-analyse für unterschiedliche Instanzen des Grundproblems, sowie Formulierungen als lineare Programme erstellt. Für beide Anwendungsmodelle werden verschiedene Polynomialzeit-Heuristiken präsentiert und charakterisiert. Ein Hauptziel der Heuristiken für nebenläufige Anwendungen ist die Wiederverwertung von Zwischenergebnissen, welche von mehreren Anwedungen geteilt werden

    Optimization and Management of Large-scale Scientific Workflows in Heterogeneous Network Environments: From Theory to Practice

    Get PDF
    Next-generation computation-intensive scientific applications feature large-scale computing workflows of various structures, which can be modeled as simple as linear pipelines or as complex as Directed Acyclic Graphs (DAGs). Supporting such computing workflows and optimizing their end-to-end network performance are crucial to the success of scientific collaborations that require fast system response, smooth data flow, and reliable distributed operation.We construct analytical cost models and formulate a class of workflow mapping problems with different mapping objectives and network constraints. The difficulty of these mapping problems essentially arises from the topological matching nature in the spatial domain, which is further compounded by the resource sharing complicacy in the temporal dimension. We provide detailed computational complexity analysis and design optimal or heuristic algorithms with rigorous correctness proof or performance analysis. We decentralize the proposed mapping algorithms and also investigate these optimization problems in unreliable network environments for fault tolerance.To examine and evaluate the performance of the workflow mapping algorithms before actual deployment and implementation, we implement a simulation program that simulates the execution dynamics of distributed computing workflows. We also develop a scientific workflow automation and management platform based on an existing workflow engine for experimentations in real environments. The performance superiority of the proposed mapping solutions are illustrated by extensive simulation-based comparisons with existing algorithms and further verified by large-scale experiments on real-life scientific workflow applications through effective system implementation and deployment in real networks
    • …
    corecore