13 research outputs found

    Flexible Skeletal Programming with eSkel

    Get PDF
    International audienceno abstrac

    Using eSkel to Implement the Multiple Baseline Stereo Application

    Get PDF
    We give an overview of the Edinburgh Skeleton Library eSkel, a structured parallel programming library which offers a range of skeletal parallel programming constructs to the C/MPI programmer. Then we illustrate the efficacy of such a high level approach through an application of multiple baseline stereo. We describe the application and show different ways to introduce parallelism using algorithmic skeletons. Some performance results will be reported

    PySke: Algorithmic Skeletons for Python

    Get PDF
    International audiencePySke is a library of parallel algorithmic skeletons in Python designed for list and tree data structures. Such algorithmic skeletons are high-order functions implemented in parallel. An application developed with PySke is a composition of skeletons. To ease the write of parallel programs, PySke does not follow the Single Program Multiple Data (SPMD) paradigm but offers a global view of parallel programs to users. This approach aims at writing scalable programs easily. In addition to the library, we present experiments performed on a high-performance computing cluster (distributed memory) on a set of example applications developed with PySke

    A Structural Approach for Modelling Performance of Systems Using Skeletons

    Get PDF
    AbstractIn this paper, we discuss a structural approach to automatic performance modelling of skeleton based applications. This uses a synthesis of performance evaluation process algebra (pepa) and a pattern-oriented hierarchical expression scheme. Such approaches are important in parallel and distributed systems where the performance models must be updated regularly based on the current state of the resources

    HPC-GAP: engineering a 21st-century high-performance computer algebra system

    Get PDF
    Symbolic computation has underpinned a number of key advances in Mathematics and Computer Science. Applications are typically large and potentially highly parallel, making them good candidates for parallel execution at a variety of scales from multi-core to high-performance computing systems. However, much existing work on parallel computing is based around numeric rather than symbolic computations. In particular, symbolic computing presents particular problems in terms of varying granularity and irregular task sizes thatdo not match conventional approaches to parallelisation. It also presents problems in terms of the structure of the algorithms and data. This paper describes a new implementation of the free open-source GAP computational algebra system that places parallelism at the heart of the design, dealing with the key scalability and cross-platform portability problems. We provide three system layers that deal with the three most important classes of hardware: individual shared memory multi-core nodes, mid-scale distributed clusters of (multi-core) nodes, and full-blown HPC systems, comprising large-scale tightly-connected networks of multi-core nodes. This requires us to develop new cross-layer programming abstractions in the form of new domain-specific skeletons that allow us to seamlessly target different hardware levels. Our results show that, using our approach, we can achieve good scalability and speedups for two realistic exemplars, on high-performance systems comprising up to 32,000 cores, as well as on ubiquitous multi-core systems and distributed clusters. The work reported here paves the way towards full scale exploitation of symbolic computation by high-performance computing systems, and we demonstrate the potential with two major case studies

    Tools and Models for High Level Parallel and Grid Programming

    Full text link
    When algorithmic skeletons were first introduced by Cole in late 1980 the idea had an almost immediate success. The skeletal approach has been proved to be effective when application algorithms can be expressed in terms of skeletons composition. However, despite both their effectiveness and the progress made in skeletal systems design and implementation, algorithmic skeletons remain absent from mainstream practice. Cole and other researchers, focused the problem. They recognized the issues affecting skeletal systems and stated a set of principles that have to be tackled in order to make them more effective and to take skeletal programming into the parallel mainstream. In this thesis we propose tools and models for addressing some among the skeletal programming environments issues. We describe three novel approaches aimed at enhancing skeletons based systems from different angles. First, we present a model we conceived that allows algorithmic skeletons customization exploiting the macro data-flow abstraction. Then we present two results about the exploitation of meta-programming techniques for the run-time generation and optimization of macro data-flow graphs. In particular, we show how to generate and how to optimize macro data-flow graphs accordingly both to programmers provided non-functional requirements and to execution platform features. The last result we present are the Behavioural Skeletons, an approach aimed at addressing the limitations of skeletal programming environments when used for the development of component-based Grid applications. We validated all the approaches conducting several test, performed exploiting a set of tools we developed.Comment: PhD Thesis, 2008, IMT Institute for Advanced Studies, Lucca. arXiv admin note: text overlap with arXiv:1002.2722 by other author

    Skeleton data parallel per OCAMLP3L

    Get PDF
    La tesi si inserisce nell'ambito della programmazione a skeleton: uno skeleton è un particolare struttura di un algoritmo parallelo con la quale possono essere parallelizzate molte classi di problemi sequenziali; uno skeleton è dunque uno schema algoritmico parametrico rispetto alla funzione sequenziale da eseguire in parallelo. Dall'analisi del problema sequenziale è possibile derivare il particolare skeleton, tra alcuni ben conosciuti, che esegue un'applicazione parallela funzionalmente equivalente a quella sequenziale ma in generale caratterizzata da prestazioni migliori. Gli skeleton possono essere suddivisi in due categorie: task parallel, in cui il parallelismo deriva dall'applicazione contemporanea di una funzione su più elementi di una sequenza di input, e data parallel, in cui il parallelismo deriva dall'applicazione contemporanea di una funzione su parti diverse dello stesso elemento, una struttura dati decomponibile, di una sequenza di input. Gli skeleton possono essere utilizzati per raggiungere due principali obiettivi, spesso in antitesi tra loro, e sono da un lato le prestazioni massime dall'altro la facilità con la quale il programmatore può esprimere un'applicazione parallela anche riutilizzando codice preesistente pur ottenendo miglioramenti prestazionali. Negli ultimi vent'anni il concetto di skeleton ha avuto molto successo in ambito accademico e sono stati proposti numerosi sistemi che offrono skeleton sia task che data parallel e che si propongono di raggiungere uno dei due principali obiettivi appena visti. In particolare in letteratura sono stati proposti sia linguaggi per la programmazione parallela strutturata, in cui gli skeleton sono offerti come costrutti nativi di tali linguaggi, sia delle librerie, in cui gli skeleton sono offerti come funzioni o metodi. La nostra tesi verte sul sistema OCamlP3l, una libreria in linguaggio OCaml, che deriva da uno dei primi linguaggi per la programmazione parallela strutturata, P3L, sviluppato presso il Dipartimento di Informatica dell'Università di Pisa. Il lavoro è stato volto principalmente all'introduzione di uno skeleton data parallel che permettesse di parallelizzare computazioni di struttura più complessa rispetto agli skeleton già esistenti nel sistema. Tali computazioni sono denominate stencil e tendono a trovarsi in applicazioni iterative in cui la condizione di terminazione spesso dipende dal risultato ottenuto ad ogni iterazione; lo skeleton introdotto permette di esprimere anche queste applicazioni

    Tunneling SSL in Muskel : implementazione e valutazione delle prestazioni

    Get PDF
    Si discute l'introduzione di tecnologia SSL per le comunicazioni con nodi remoti "non sicuri" nell'interprete distribuito di Muskel, ambiente a skeleton basato su macro data flow.Se ne discutono le implicazioni sulle performance

    Implementation and Evaluation of Algorithmic Skeletons: Parallelisation of Computer Algebra Algorithms

    Get PDF
    This thesis presents design and implementation approaches for the parallel algorithms of computer algebra. We use algorithmic skeletons and also further approaches, like data parallel arithmetic and actors. We have implemented skeletons for divide and conquer algorithms and some special parallel loops, that we call ‘repeated computation with a possibility of premature termination’. We introduce in this thesis a rational data parallel arithmetic. We focus on parallel symbolic computation algorithms, for these algorithms our arithmetic provides a generic parallelisation approach. The implementation is carried out in Eden, a parallel functional programming language based on Haskell. This choice enables us to encode both the skeletons and the programs in the same language. Moreover, it allows us to refrain from using two different languages—one for the implementation and one for the interface—for our implementation of computer algebra algorithms. Further, this thesis presents methods for evaluation and estimation of parallel execution times. We partition the parallel execution time into two components. One of them accounts for the quality of the parallelisation, we call it the ‘parallel penalty’. The other is the sequential execution time. For the estimation, we predict both components separately, using statistical methods. This enables very confident estimations, although using drastically less measurement points than other methods. We have applied both our evaluation and estimation approaches to the parallel programs presented in this thesis. We haven also used existing estimation methods. We developed divide and conquer skeletons for the implementation of fast parallel multiplication. We have implemented the Karatsuba algorithm, Strassen’s matrix multiplication algorithm and the fast Fourier transform. The latter was used to implement polynomial convolution that leads to a further fast multiplication algorithm. Specially for our implementation of Strassen algorithm we have designed and implemented a divide and conquer skeleton basing on actors. We have implemented the parallel fast Fourier transform, and not only did we use new divide and conquer skeletons, but also developed a map-and-transpose skeleton. It enables good parallelisation of the Fourier transform. The parallelisation of Karatsuba multiplication shows a very good performance. We have analysed the parallel penalty of our programs and compared it to the serial fraction—an approach, known from literature. We also performed execution time estimations of our divide and conquer programs. This thesis presents a parallel map+reduce skeleton scheme. It allows us to combine the usual parallel map skeletons, like parMap, farm, workpool, with a premature termination property. We use this to implement the so-called ‘parallel repeated computation’, a special form of a speculative parallel loop. We have implemented two probabilistic primality tests: the Rabin–Miller test and the Jacobi sum test. We parallelised both with our approach. We analysed the task distribution and stated the fitting configurations of the Jacobi sum test. We have shown formally that the Jacobi sum test can be implemented in parallel. Subsequently, we parallelised it, analysed the load balancing issues, and produced an optimisation. The latter enabled a good implementation, as verified using the parallel penalty. We have also estimated the performance of the tests for further input sizes and numbers of processing elements. Parallelisation of the Jacobi sum test and our generic parallelisation scheme for the repeated computation is our original contribution. The data parallel arithmetic was defined not only for integers, which is already known, but also for rationals. We handled the common factors of the numerator or denominator of the fraction with the modulus in a novel manner. This is required to obtain a true multiple-residue arithmetic, a novel result of our research. Using these mathematical advances, we have parallelised the determinant computation using the Gauß elimination. As always, we have performed task distribution analysis and estimation of the parallel execution time of our implementation. A similar computation in Maple emphasised the potential of our approach. Data parallel arithmetic enables parallelisation of entire classes of computer algebra algorithms. Summarising, this thesis presents and thoroughly evaluates new and existing design decisions for high-level parallelisations of computer algebra algorithms
    corecore