622 research outputs found
CASCH: a tool for computer-aided scheduling
A software tool called Computer-Aided Scheduling (CASCH) for parallel processing on distributed-memory multiprocessors in a complete parallel programming environment is presented. A compiler automatically converts sequential applications into parallel codes to perform program parallelization. The parallel code that executes on a target machine is optimized by CASCH through proper scheduling and mapping.published_or_final_versio
A Survey of Pipelined Workflow Scheduling: Models and Algorithms
International audienceA large class of applications need to execute the same workflow on different data sets of identical size. Efficient execution of such applications necessitates intelligent distribution of the application components and tasks on a parallel machine, and the execution can be orchestrated by utilizing task-, data-, pipelined-, and/or replicated-parallelism. The scheduling problem that encompasses all of these techniques is called pipelined workflow scheduling, and it has been widely studied in the last decade. Multiple models and algorithms have flourished to tackle various programming paradigms, constraints, machine behaviors or optimization goals. This paper surveys the field by summing up and structuring known results and approaches
Reactive Scheduling of DAG Applications on Heterogeneous and Dynamic Distributed Computing Systems
Institute for Computing Systems ArchitectureEmerging technologies enable a set of distributed resources across a network to be linked together and used in a coordinated fashion to solve a particular parallel application at the same time. Such applications are often abstracted as directed
acyclic graphs (DAGs), in which vertices represent application tasks and edges represent data dependencies between tasks.
Effective scheduling mechanisms for DAG applications are essential to exploit the tremendous potential of computational resources. The core issues are that the availability and performance of resources, which are already by their nature heterogeneous, can be expected to vary dynamically, even during the course of an
execution. In this thesis, we first consider the problem of scheduling DAG task graphs onto heterogeneous resources with changeable
capabilities. We propose a list-scheduling heuristic approach, the Global Task
Positioning (GTP) scheduling method, which addresses the problem by allowing rescheduling and migration of tasks in
response to significant variations in resource characteristics. We observed from experiments with GTP that in an execution with relatively frequent migration, it may be that, over time, the results of some task have been copied to several other sites, and
so a subsequent migrated task may have several possible sources for each of its inputs. Some of these copies may now be
more quickly accessible than the original, due to dynamic variations in communication capabilities. To exploit this observation, we extended our model with a Copying
Management(CM) function, resulting in a new version, the Global Task Positioning with copying facilities (GTP/c) system. The
idea is to reuse such copies, in subsequent migration of placed tasks, in order to reduce the impact of migration cost on makespan.
Finally, we believe that fault tolerance is an important issue in heterogeneous and dynamic computational environments as the
availability of resources cannot be
guaranteed. To address the problem of processor failure, we propose a rewinding mechanism which rewinds the progress of the
application to a previous state, thereby preserving the execution in spite of the failed processor(s).
We evaluate our mechanisms through simulation, since this allow us
to generate repeatable patterns of resource performance variation. We use a standard benchmark set of DAGs, comparing performance against that of competing algorithms from the scheduling literature
Static and Dynamic Scheduling for Effective Use of Multicore Systems
Multicore systems have increasingly gained importance in high performance computers. Compared to the traditional microarchitectures, multicore architectures have a simpler design, higher performance-to-area ratio, and improved power efficiency. Although the multicore architecture has various advantages, traditional parallel programming techniques do not apply to the new architecture efficiently. This dissertation addresses how to determine optimized thread schedules to improve data reuse on shared-memory multicore systems and how to seek a scalable solution to designing parallel software on both shared-memory and distributed-memory multicore systems.
We propose an analytical cache model to predict the number of cache misses on the time-sharing L2 cache on a multicore processor. The model provides an insight into the impact of cache sharing and cache contention between threads. Inspired by the model, we build the framework of affinity based thread scheduling to determine optimized thread schedules to improve data reuse on all the levels in a complex memory hierarchy. The affinity based thread scheduling framework includes a model to estimate the cost of a thread schedule, which consists of three submodels: an affinity graph submodel, a memory hierarchy submodel, and a cost submodel. Based on the model, we design a hierarchical graph partitioning algorithm to determine near-optimal solutions. We have also extended the algorithm to support threads with data dependences. The algorithms are implemented and incorporated into a feedback directed optimization prototype system. The prototype system builds upon a binary instrumentation tool and can improve program performance greatly on shared-memory multicore architectures.
We also study the dynamic data-availability driven scheduling approach to designing new parallel software on distributed-memory multicore architectures. We have implemented a decentralized dynamic runtime system. The design of the runtime system is focused on the scalability metric. At any time only a small portion of a task graph exists in memory. We propose an algorithm to solve data dependences without process cooperation in a distributed manner. Our experimental results demonstrate the scalability and practicality of the approach for both shared-memory and distributed-memory multicore systems. Finally, we present a scalable nonblocking topology-aware multicast scheme for distributed DAG scheduling applications
Managing distributed flexible manufacturing systems
Per molti anni la ricerca scientifica si è concentrata sui diversi aspetti di gestione dei sistemi
manifatturieri, dall’ottimizzazione dei singoli processi produttivi, fino alla gestione delle più
complesse imprese virtuali. Tuttavia molti aspetti inerenti il coordinamento e il controllo, ancora
presentano problematiche rilevanti in ambito industriale e temi di ricerca aperti.
L’applicazione di tecnologie avanzate e di strumenti informatici evoluti non riesce da sola a
garantire il successo nelle funzioni di controllo e di integrazione. Al fine di ottenere un alto grado di
efficienza, è necessario supportare tali tecnologie e strumenti con dei modelli che siano in grado di
rappresentare le funzionalità e i processi dei sistemi manifatturieri, e consentano di prevederne e
gestirne l’evoluzione. Ne emerge l’esigenza di politiche di controllo e di gestio ne distribuite, che
favoriscano l’auto-organizzazione e la cooperazione nei sistemi manifatturieri.
I sistemi manifatturieri flessibili distribuiti (DFMS), in risposta a tale esigenza, sono sistemi di
produzione dinamici in grado di garantire una risposta in tempo reale alla allocazione ottima delle
risorse, e organizzare efficientemente le lavorazioni dei prodotti.
In questa tesi viene proposta una modellizzazione a livelli per tali sistemi. Secondo tale
rappresentazione un DFMS può essere visto come un grafo strutturato su più livelli, tale che: i
vertici del grafo rappresentano le risorse interagenti nel sistema; ogni nodo di un livello rappresenta
a sua volta un livello successivo. Partendo da questa rappresentazione, sono stati quindi sviluppati
due modelli per lo studio dell’allocazione ottima delle risorse (task mapping) e per l’organizzazione
di lavorazioni (task scheduling) che richiedono l’uso simultaneo di risorse condivise nel sistema.
Il task mapping problem consiste nella suddivisione bilanciata di un certo insieme di lavorazioni tra
le risorse del sistema. In questa tesi si è studiato il caso in cui le lavorazioni sono omogenee, non
presentano vincoli di precedenza, ma necessitano di un certo volume di comunicazione tra le risorse
cui sono assegnate per garantirne il coordinamento, incidendo in tal senso sulla complessità di
gestione. L’analisi critica dei modelli che sono tipicamente usati in letteratura per rappresentare tale
problema, ne hanno posto in evidenza l’inadeguatezza. Attraverso alcuni risultati teorici si è quindi
dimostrato come il problema possa ricondursi ad un hypergraph partitioning problem. Studiando la
formulazione matematica di tali problemi, e limitandosi al caso di due risorse produttive, si è infine
giunti alla determinazione di una buona approssimazione sulla soluzione ottima.
Il problema di sequenziamento delle lavorazioni (task scheduling) che richiedono l’uso simultaneo
di risorse condivise è stato trattato nel caso specifico di celle robotizzate. E’ stata quindi dimostrata
l’NP-completezza di questo problema ed è stata progettata una euristica di soluzione, validandone i
risultati in diversi scenari produttivi.For several years, research has focused on several aspects of manufacturing, from the individual
processes towards the management of virtual enterprises, but several aspects, like coordination and
control, still have relevant problems in industry and remain challenging areas of research.
The application of advanced technologies and informational tools by itself does not guarantee the
success of control and integration applications. In order to get a high degree of integration and
efficiency, it is necessary to match the technologies and tools with models that describe the existing
knowledge and functionality in the system and allow the correct understanding of its behaviour. In a
global and wide market competition, the manufacturing systems present requirements that lead to
distributed, self-organised, co-operative and heterogeneous control applications.
A Distributed Flexible Manufacturing System (DFMS) is a goal-driven and data-directed dynamic
system which is designed to provide an effective operation sequence for the products to fulfil the
production goals, to meet real-time requirements and to optimally allocate resources.
In this work first a layered approach for modeling such production systems is proposed. According
to that representation, a DFMS may be seen as multi-layer resource-graph such that: vertices on a
layer represent interacting resources; a layer at level l is represented by a node in the layer at level
(l-1). Then two models are developed concerning with two relevant managerial issues in DFMS, the
task mapping problem and the task scheduling with multiple shared resources problem.
The task mapping problem concerns with the balanced partition of a given set of jobs and the
assignment of the parts to the resources of the manufacturing system. We study the case in which
the jobs are quite homogeneous, do not have precedence constraints, but need some
communications to be coordinated. So, jobs assignment to different parts causes a relevant
communication effort between those parts, increasing the managerial complexity. We show that the
standard models usually used to formal represent such a problem are wrong. Through some graph
theoretical results we relate the problem to the well-known hypergraph partitioning problem and
briefly survey the best techniques to solve the problem. A new formulation of the problem is then
presented. Some considerations on an improved version of the formulation permit the computation
of a good Lower Bound on the optimal solution in the case of the hypergraph bisection.
The task scheduling with multiple shared resources problem is addressed for a robotic cell. We
study the general problem of sequencing multiple jobs, where each job consists of multiple ordered
tasks and tasks execution requires simultaneous usage of several resources. NP-completeness results
are given. A heuristic with a guarantee approximation result is designed and evaluated
SCALABLE TECHNIQUES FOR SCHEDULING AND MAPPING DSP APPLICATIONS ONTO EMBEDDED MULTIPROCESSOR PLATFORMS
A variety of multiprocessor architectures has proliferated even for off-the-shelf computing platforms. To make use of these platforms, traditional implementation frameworks focus on implementing Digital Signal Processing (DSP) applications using special platform features to achieve high performance. However, due to the fast evolution of the underlying architectures, solution redevelopment is error prone and re-usability of existing solutions and libraries is limited. In this thesis, we facilitate an efficient migration of DSP systems to multiprocessor platforms while systematically leveraging previous investment in optimized library kernels using dataflow design frameworks. We make these library elements, which are typically tailored to specialized architectures, more amenable to extensive analysis and optimization using an efficient and systematic process. In this thesis we provide techniques to allow such migration through four basic contributions:
1. We propose and develop a framework to explore efficient utilization of Single Instruction Multiple Data (SIMD) cores and accelerators available in heterogeneous multiprocessor platforms consisting of General Purpose Processors (GPPs) and Graphics Processing Units (GPUs). We also propose new scheduling techniques by applying extensive block processing in conjunction with appropriate task mapping and task ordering methods that match efficiently with the underlying architecture. The approach gives the developer the ability to prototype a GPU-accelerated application and explore its design space efficiently and effectively.
2. We introduce the concept of Partial Expansion Graphs (PEGs) as an implementation model and associated class of scheduling strategies. PEGs are designed to help realize DSP systems in terms of forms and granularities of parallelism that are well matched to the given applications and targeted platforms. PEGs also facilitate derivation of both static and dynamic scheduling techniques, depending on the amount of variability in task execution times and other operating conditions. We show how to implement efficient PEG-based scheduling methods using real time operating systems, and to re-use pre-optimized libraries of DSP components within such implementations.
3. We develop new algorithms for scheduling and mapping systems implemented using PEGs. Collectively, these algorithms operate in three steps. First, the amount of data parallelism in the application graph is tuned systematically over many iterations to profit from the available cores in the target platform. Then a mapping algorithm that uses graph analysis is developed to distribute data and task parallel instances over different cores while trying to balance the load of all processing units to make use of pipeline parallelism. Finally, we use a novel technique for performance evaluation by implementing the scheduler and a customizable solution on the programmable platform. This allows accurate fitness functions to be measured and used to drive runtime adaptation of schedules.
4. In addition to providing scheduling techniques for the mentioned applications and platforms, we also show how to integrate the resulting solution in the underlying environment. This is achieved by leveraging existing libraries and applying the GPP-GPU scheduling framework to augment a popular existing Software Defined Radio (SDR) development environment -- GNU Radio -- with a dataflow foundation and a stand-alone GPU-accelerated library. We also show how to realize the PEG model on real time operating system libraries, such as the Texas Instruments DSP/BIOS. A code generator that accepts a manual system designer solution as well as automatically configured solutions is provided to complete the design flow starting from application model to running system
Enhancing reliability with Latin Square redundancy on desktop grids.
Computational grids are some of the largest computer systems in existence today. Unfortunately they are also, in many cases, the least reliable. This research examines the use of redundancy with permutation as a method of improving reliability in computational grid applications. Three primary avenues are explored - development of a new redundancy model, the Replication and Permutation Paradigm (RPP) for computational grids, development of grid simulation software for testing RPP against other redundancy methods and, finally, running a program on a live grid using RPP. An important part of RPP involves distributing data and tasks across the grid in Latin Square fashion. Two theorems and subsequent proofs regarding Latin Squares are developed. The theorems describe the changing position of symbols between the rows of a standard Latin Square. When a symbol is missing because a column is removed the theorems provide a basis for determining the next row and column where the missing symbol can be found. Interesting in their own right, the theorems have implications for redundancy. In terms of the redundancy model, the theorems allow one to state the maximum makespan in the face of missing computational hosts when using Latin Square redundancy. The simulator software was developed and used to compare different data and task distribution schemes on a simulated grid. The software clearly showed the advantage of running RPP, which resulted in faster completion times in the face of computational host failures. The Latin Square method also fails gracefully in that jobs complete with massive node failure while increasing makespan. Finally an Inductive Logic Program (ILP) for pharmacophore search was executed, using a Latin Square redundancy methodology, on a Condor grid in the Dahlem Lab at the University of Louisville Speed School of Engineering. All jobs completed, even in the face of large numbers of randomly generated computational host failures
- …