622 research outputs found

    CASCH: a tool for computer-aided scheduling

    Get PDF
    A software tool called Computer-Aided Scheduling (CASCH) for parallel processing on distributed-memory multiprocessors in a complete parallel programming environment is presented. A compiler automatically converts sequential applications into parallel codes to perform program parallelization. The parallel code that executes on a target machine is optimized by CASCH through proper scheduling and mapping.published_or_final_versio

    A Survey of Pipelined Workflow Scheduling: Models and Algorithms

    Get PDF
    International audienceA large class of applications need to execute the same workflow on different data sets of identical size. Efficient execution of such applications necessitates intelligent distribution of the application components and tasks on a parallel machine, and the execution can be orchestrated by utilizing task-, data-, pipelined-, and/or replicated-parallelism. The scheduling problem that encompasses all of these techniques is called pipelined workflow scheduling, and it has been widely studied in the last decade. Multiple models and algorithms have flourished to tackle various programming paradigms, constraints, machine behaviors or optimization goals. This paper surveys the field by summing up and structuring known results and approaches

    Reactive Scheduling of DAG Applications on Heterogeneous and Dynamic Distributed Computing Systems

    Get PDF
    Institute for Computing Systems ArchitectureEmerging technologies enable a set of distributed resources across a network to be linked together and used in a coordinated fashion to solve a particular parallel application at the same time. Such applications are often abstracted as directed acyclic graphs (DAGs), in which vertices represent application tasks and edges represent data dependencies between tasks. Effective scheduling mechanisms for DAG applications are essential to exploit the tremendous potential of computational resources. The core issues are that the availability and performance of resources, which are already by their nature heterogeneous, can be expected to vary dynamically, even during the course of an execution. In this thesis, we first consider the problem of scheduling DAG task graphs onto heterogeneous resources with changeable capabilities. We propose a list-scheduling heuristic approach, the Global Task Positioning (GTP) scheduling method, which addresses the problem by allowing rescheduling and migration of tasks in response to significant variations in resource characteristics. We observed from experiments with GTP that in an execution with relatively frequent migration, it may be that, over time, the results of some task have been copied to several other sites, and so a subsequent migrated task may have several possible sources for each of its inputs. Some of these copies may now be more quickly accessible than the original, due to dynamic variations in communication capabilities. To exploit this observation, we extended our model with a Copying Management(CM) function, resulting in a new version, the Global Task Positioning with copying facilities (GTP/c) system. The idea is to reuse such copies, in subsequent migration of placed tasks, in order to reduce the impact of migration cost on makespan. Finally, we believe that fault tolerance is an important issue in heterogeneous and dynamic computational environments as the availability of resources cannot be guaranteed. To address the problem of processor failure, we propose a rewinding mechanism which rewinds the progress of the application to a previous state, thereby preserving the execution in spite of the failed processor(s). We evaluate our mechanisms through simulation, since this allow us to generate repeatable patterns of resource performance variation. We use a standard benchmark set of DAGs, comparing performance against that of competing algorithms from the scheduling literature

    Static and Dynamic Scheduling for Effective Use of Multicore Systems

    Get PDF
    Multicore systems have increasingly gained importance in high performance computers. Compared to the traditional microarchitectures, multicore architectures have a simpler design, higher performance-to-area ratio, and improved power efficiency. Although the multicore architecture has various advantages, traditional parallel programming techniques do not apply to the new architecture efficiently. This dissertation addresses how to determine optimized thread schedules to improve data reuse on shared-memory multicore systems and how to seek a scalable solution to designing parallel software on both shared-memory and distributed-memory multicore systems. We propose an analytical cache model to predict the number of cache misses on the time-sharing L2 cache on a multicore processor. The model provides an insight into the impact of cache sharing and cache contention between threads. Inspired by the model, we build the framework of affinity based thread scheduling to determine optimized thread schedules to improve data reuse on all the levels in a complex memory hierarchy. The affinity based thread scheduling framework includes a model to estimate the cost of a thread schedule, which consists of three submodels: an affinity graph submodel, a memory hierarchy submodel, and a cost submodel. Based on the model, we design a hierarchical graph partitioning algorithm to determine near-optimal solutions. We have also extended the algorithm to support threads with data dependences. The algorithms are implemented and incorporated into a feedback directed optimization prototype system. The prototype system builds upon a binary instrumentation tool and can improve program performance greatly on shared-memory multicore architectures. We also study the dynamic data-availability driven scheduling approach to designing new parallel software on distributed-memory multicore architectures. We have implemented a decentralized dynamic runtime system. The design of the runtime system is focused on the scalability metric. At any time only a small portion of a task graph exists in memory. We propose an algorithm to solve data dependences without process cooperation in a distributed manner. Our experimental results demonstrate the scalability and practicality of the approach for both shared-memory and distributed-memory multicore systems. Finally, we present a scalable nonblocking topology-aware multicast scheme for distributed DAG scheduling applications

    Managing distributed flexible manufacturing systems

    Get PDF
    Per molti anni la ricerca scientifica si è concentrata sui diversi aspetti di gestione dei sistemi manifatturieri, dall’ottimizzazione dei singoli processi produttivi, fino alla gestione delle più complesse imprese virtuali. Tuttavia molti aspetti inerenti il coordinamento e il controllo, ancora presentano problematiche rilevanti in ambito industriale e temi di ricerca aperti. L’applicazione di tecnologie avanzate e di strumenti informatici evoluti non riesce da sola a garantire il successo nelle funzioni di controllo e di integrazione. Al fine di ottenere un alto grado di efficienza, è necessario supportare tali tecnologie e strumenti con dei modelli che siano in grado di rappresentare le funzionalità e i processi dei sistemi manifatturieri, e consentano di prevederne e gestirne l’evoluzione. Ne emerge l’esigenza di politiche di controllo e di gestio ne distribuite, che favoriscano l’auto-organizzazione e la cooperazione nei sistemi manifatturieri. I sistemi manifatturieri flessibili distribuiti (DFMS), in risposta a tale esigenza, sono sistemi di produzione dinamici in grado di garantire una risposta in tempo reale alla allocazione ottima delle risorse, e organizzare efficientemente le lavorazioni dei prodotti. In questa tesi viene proposta una modellizzazione a livelli per tali sistemi. Secondo tale rappresentazione un DFMS può essere visto come un grafo strutturato su più livelli, tale che: i vertici del grafo rappresentano le risorse interagenti nel sistema; ogni nodo di un livello rappresenta a sua volta un livello successivo. Partendo da questa rappresentazione, sono stati quindi sviluppati due modelli per lo studio dell’allocazione ottima delle risorse (task mapping) e per l’organizzazione di lavorazioni (task scheduling) che richiedono l’uso simultaneo di risorse condivise nel sistema. Il task mapping problem consiste nella suddivisione bilanciata di un certo insieme di lavorazioni tra le risorse del sistema. In questa tesi si è studiato il caso in cui le lavorazioni sono omogenee, non presentano vincoli di precedenza, ma necessitano di un certo volume di comunicazione tra le risorse cui sono assegnate per garantirne il coordinamento, incidendo in tal senso sulla complessità di gestione. L’analisi critica dei modelli che sono tipicamente usati in letteratura per rappresentare tale problema, ne hanno posto in evidenza l’inadeguatezza. Attraverso alcuni risultati teorici si è quindi dimostrato come il problema possa ricondursi ad un hypergraph partitioning problem. Studiando la formulazione matematica di tali problemi, e limitandosi al caso di due risorse produttive, si è infine giunti alla determinazione di una buona approssimazione sulla soluzione ottima. Il problema di sequenziamento delle lavorazioni (task scheduling) che richiedono l’uso simultaneo di risorse condivise è stato trattato nel caso specifico di celle robotizzate. E’ stata quindi dimostrata l’NP-completezza di questo problema ed è stata progettata una euristica di soluzione, validandone i risultati in diversi scenari produttivi.For several years, research has focused on several aspects of manufacturing, from the individual processes towards the management of virtual enterprises, but several aspects, like coordination and control, still have relevant problems in industry and remain challenging areas of research. The application of advanced technologies and informational tools by itself does not guarantee the success of control and integration applications. In order to get a high degree of integration and efficiency, it is necessary to match the technologies and tools with models that describe the existing knowledge and functionality in the system and allow the correct understanding of its behaviour. In a global and wide market competition, the manufacturing systems present requirements that lead to distributed, self-organised, co-operative and heterogeneous control applications. A Distributed Flexible Manufacturing System (DFMS) is a goal-driven and data-directed dynamic system which is designed to provide an effective operation sequence for the products to fulfil the production goals, to meet real-time requirements and to optimally allocate resources. In this work first a layered approach for modeling such production systems is proposed. According to that representation, a DFMS may be seen as multi-layer resource-graph such that: vertices on a layer represent interacting resources; a layer at level l is represented by a node in the layer at level (l-1). Then two models are developed concerning with two relevant managerial issues in DFMS, the task mapping problem and the task scheduling with multiple shared resources problem. The task mapping problem concerns with the balanced partition of a given set of jobs and the assignment of the parts to the resources of the manufacturing system. We study the case in which the jobs are quite homogeneous, do not have precedence constraints, but need some communications to be coordinated. So, jobs assignment to different parts causes a relevant communication effort between those parts, increasing the managerial complexity. We show that the standard models usually used to formal represent such a problem are wrong. Through some graph theoretical results we relate the problem to the well-known hypergraph partitioning problem and briefly survey the best techniques to solve the problem. A new formulation of the problem is then presented. Some considerations on an improved version of the formulation permit the computation of a good Lower Bound on the optimal solution in the case of the hypergraph bisection. The task scheduling with multiple shared resources problem is addressed for a robotic cell. We study the general problem of sequencing multiple jobs, where each job consists of multiple ordered tasks and tasks execution requires simultaneous usage of several resources. NP-completeness results are given. A heuristic with a guarantee approximation result is designed and evaluated

    SCALABLE TECHNIQUES FOR SCHEDULING AND MAPPING DSP APPLICATIONS ONTO EMBEDDED MULTIPROCESSOR PLATFORMS

    Get PDF
    A variety of multiprocessor architectures has proliferated even for off-the-shelf computing platforms. To make use of these platforms, traditional implementation frameworks focus on implementing Digital Signal Processing (DSP) applications using special platform features to achieve high performance. However, due to the fast evolution of the underlying architectures, solution redevelopment is error prone and re-usability of existing solutions and libraries is limited. In this thesis, we facilitate an efficient migration of DSP systems to multiprocessor platforms while systematically leveraging previous investment in optimized library kernels using dataflow design frameworks. We make these library elements, which are typically tailored to specialized architectures, more amenable to extensive analysis and optimization using an efficient and systematic process. In this thesis we provide techniques to allow such migration through four basic contributions: 1. We propose and develop a framework to explore efficient utilization of Single Instruction Multiple Data (SIMD) cores and accelerators available in heterogeneous multiprocessor platforms consisting of General Purpose Processors (GPPs) and Graphics Processing Units (GPUs). We also propose new scheduling techniques by applying extensive block processing in conjunction with appropriate task mapping and task ordering methods that match efficiently with the underlying architecture. The approach gives the developer the ability to prototype a GPU-accelerated application and explore its design space efficiently and effectively. 2. We introduce the concept of Partial Expansion Graphs (PEGs) as an implementation model and associated class of scheduling strategies. PEGs are designed to help realize DSP systems in terms of forms and granularities of parallelism that are well matched to the given applications and targeted platforms. PEGs also facilitate derivation of both static and dynamic scheduling techniques, depending on the amount of variability in task execution times and other operating conditions. We show how to implement efficient PEG-based scheduling methods using real time operating systems, and to re-use pre-optimized libraries of DSP components within such implementations. 3. We develop new algorithms for scheduling and mapping systems implemented using PEGs. Collectively, these algorithms operate in three steps. First, the amount of data parallelism in the application graph is tuned systematically over many iterations to profit from the available cores in the target platform. Then a mapping algorithm that uses graph analysis is developed to distribute data and task parallel instances over different cores while trying to balance the load of all processing units to make use of pipeline parallelism. Finally, we use a novel technique for performance evaluation by implementing the scheduler and a customizable solution on the programmable platform. This allows accurate fitness functions to be measured and used to drive runtime adaptation of schedules. 4. In addition to providing scheduling techniques for the mentioned applications and platforms, we also show how to integrate the resulting solution in the underlying environment. This is achieved by leveraging existing libraries and applying the GPP-GPU scheduling framework to augment a popular existing Software Defined Radio (SDR) development environment -- GNU Radio -- with a dataflow foundation and a stand-alone GPU-accelerated library. We also show how to realize the PEG model on real time operating system libraries, such as the Texas Instruments DSP/BIOS. A code generator that accepts a manual system designer solution as well as automatically configured solutions is provided to complete the design flow starting from application model to running system

    Modelling and scheduling of heterogeneous computing systems

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Enhancing reliability with Latin Square redundancy on desktop grids.

    Get PDF
    Computational grids are some of the largest computer systems in existence today. Unfortunately they are also, in many cases, the least reliable. This research examines the use of redundancy with permutation as a method of improving reliability in computational grid applications. Three primary avenues are explored - development of a new redundancy model, the Replication and Permutation Paradigm (RPP) for computational grids, development of grid simulation software for testing RPP against other redundancy methods and, finally, running a program on a live grid using RPP. An important part of RPP involves distributing data and tasks across the grid in Latin Square fashion. Two theorems and subsequent proofs regarding Latin Squares are developed. The theorems describe the changing position of symbols between the rows of a standard Latin Square. When a symbol is missing because a column is removed the theorems provide a basis for determining the next row and column where the missing symbol can be found. Interesting in their own right, the theorems have implications for redundancy. In terms of the redundancy model, the theorems allow one to state the maximum makespan in the face of missing computational hosts when using Latin Square redundancy. The simulator software was developed and used to compare different data and task distribution schemes on a simulated grid. The software clearly showed the advantage of running RPP, which resulted in faster completion times in the face of computational host failures. The Latin Square method also fails gracefully in that jobs complete with massive node failure while increasing makespan. Finally an Inductive Logic Program (ILP) for pharmacophore search was executed, using a Latin Square redundancy methodology, on a Condor grid in the Dahlem Lab at the University of Louisville Speed School of Engineering. All jobs completed, even in the face of large numbers of randomly generated computational host failures
    corecore