182 research outputs found

    PySke: Algorithmic Skeletons for Python

    Get PDF
    International audiencePySke is a library of parallel algorithmic skeletons in Python designed for list and tree data structures. Such algorithmic skeletons are high-order functions implemented in parallel. An application developed with PySke is a composition of skeletons. To ease the write of parallel programs, PySke does not follow the Single Program Multiple Data (SPMD) paradigm but offers a global view of parallel programs to users. This approach aims at writing scalable programs easily. In addition to the library, we present experiments performed on a high-performance computing cluster (distributed memory) on a set of example applications developed with PySke

    Programming with BSP Homomorphisms

    Full text link
    International audienc

    PList-based Divide and Conquer Parallel Programming

    Get PDF
    This paper details an extension of a Java parallel programming framework – JPLF. The JPLF framework is a programming framework that helps programmers build parallel programs using existing building blocks. The framework is based on {em PowerLists} and PList Theories and it naturally supports multi-way Divide and Conquer. By using this framework, the programmer is exempted from dealing with all the complexities of writing parallel programs from scratch. This extension to the JPLF framework adds PLists support to the framework and so, it enlarges the applicability of the framework to a larger set of parallel solvable problems. Using this extension, we may apply more flexible data division strategies. In addition, the length of the input lists no longer has to be a power of two – as required by the PowerLists theory. In this paper we unveil new applications that emphasize the new class of computations that can be executed within the JPLF framework. We also give a detailed description of the data structures and functions involved in the PLists extension of the JPLF, and extended performance experiments are described and analyzed

    Shape-based cost analysis of skeletal parallel programs

    Get PDF
    Institute for Computing Systems ArchitectureThis work presents an automatic cost-analysis system for an implicitly parallel skeletal programming language. Although deducing interesting dynamic characteristics of parallel programs (and in particular, run time) is well known to be an intractable problem in the general case, it can be alleviated by placing restrictions upon the programs which can be expressed. By combining two research threads, the “skeletal” and “shapely” paradigms which take this route, we produce a completely automated, computation and communication sensitive cost analysis system. This builds on earlier work in the area by quantifying communication as well as computation costs, with the former being derived for the Bulk Synchronous Parallel (BSP) model. We present details of our shapely skeletal language and its BSP implementation strategy together with an account of the analysis mechanism by which program behaviour information (such as shape and cost) is statically deduced. This information can be used at compile-time to optimise a BSP implementation and to analyse computation and communication costs. The analysis has been implemented in Haskell. We consider different algorithms expressed in our language for some example problems and illustrate each BSP implementation, contrasting the analysis of their efficiency by traditional, intuitive methods with that achieved by our cost calculator. The accuracy of cost predictions by our cost calculator against the run time of real parallel programs is tested experimentally. Previous shape-based cost analysis required all elements of a vector (our nestable bulk data structure) to have the same shape. We partially relax this strict requirement on data structure regularity by introducing new shape expressions in our analysis framework. We demonstrate that this allows us to achieve the first automated analysis of a complete derivation, the well known maximum segment sum algorithm of Skillicorn and Cai

    Toward optimised skeletons for heterogeneous parallel architecture with performance cost model

    Get PDF
    High performance architectures are increasingly heterogeneous with shared and distributed memory components, and accelerators like GPUs. Programming such architectures is complicated and performance portability is a major issue as the architectures evolve. This thesis explores the potential for algorithmic skeletons integrating a dynamically parametrised static cost model, to deliver portable performance for mostly regular data parallel programs on heterogeneous archi- tectures. The rst contribution of this thesis is to address the challenges of program- ming heterogeneous architectures by providing two skeleton-based programming libraries: i.e. HWSkel for heterogeneous multicore clusters and GPU-HWSkel that enables GPUs to be exploited as general purpose multi-processor devices. Both libraries provide heterogeneous data parallel algorithmic skeletons including hMap, hMapAll, hReduce, hMapReduce, and hMapReduceAll. The second contribution is the development of cost models for workload dis- tribution. First, we construct an architectural cost model (CM1) to optimise overall processing time for HWSkel heterogeneous skeletons on a heterogeneous system composed of networks of arbitrary numbers of nodes, each with an ar- bitrary number of cores sharing arbitrary amounts of memory. The cost model characterises the components of the architecture by the number of cores, clock speed, and crucially the size of the L2 cache. Second, we extend the HWSkel cost model (CM1) to account for GPU performance. The extended cost model (CM2) is used in the GPU-HWSkel library to automatically nd a good distribution for both a single heterogeneous multicore/GPU node, and clusters of heteroge- neous multicore/GPU nodes. Experiments are carried out on three heterogeneous multicore clusters, four heterogeneous multicore/GPU clusters, and three single heterogeneous multicore/GPU nodes. The results of experimental evaluations for four data parallel benchmarks, i.e. sumEuler, Image matching, Fibonacci, and Matrix Multiplication, show that our combined heterogeneous skeletons and cost models can make good use of resources in heterogeneous systems. Moreover using cores together with a GPU in the same host can deliver good performance either on a single node or on multiple node architectures

    Grouping complex systems for classification and parallel simulation

    Get PDF
    This thesis is concerned with grouping complex systems by means of concurrent model, in order to aid in (i) formulation of classifications and (ii) induction of parallel simulation programs. It observes, and seeks f~ furmalize _ and then exploit, the strong structural resemblance between complex systems and occam programs. The thesis hypothesizes that groups of complex systems may be discriminated according to shared structural and behavioural characteristics. Such an analysis of the complex systems domain may be performed in the abstract with the aid of a model for capturing interesting features of complex systems. The resulting groups would form a classification of complex systems. An additional hypothesis is that, insofar as the model is able to capture sufficient . programmatic information, these groups may be used to define, automatically, algorithmic skeletons for the concurrent simulation of complex systems. In order to test these hypotheses, a specification model and an accompanying formal notation are developed. The model expresses properties of complex systems in a mixture of object-oriented and process-oriented styles .. The model is then used as the basis for performing both classification and automatic induction of parallel simulation programs. The thesis takes the view that specification models should not be overly complex, especially if the specifications are meant to be executable. Therefore the requirement for explicit consideration of concurrency on the part of specifiers is minimized. The thesis formulates specifications of classes of cellular automata and neural networks according to the proposed model. Procedures for verificati6If - and induction of parallel simulation programs are also included

    Structured arrows : a type-based framework for structured parallelism

    Get PDF
    This thesis deals with the important problem of parallelising sequential code. Despite the importance of parallelism in modern computing, writing parallel software still relies on many low-level and often error-prone approaches. These low-level approaches can lead to serious execution problems such as deadlocks and race conditions. Due to the non-deterministic behaviour of most parallel programs, testing parallel software can be both tedious and time-consuming. A way of providing guarantees of correctness for parallel programs would therefore provide significant benefit. Moreover, even if we ignore the problem of correctness, achieving good speedups is not straightforward, since this generally involves rewriting a program to consider a (possibly large) number of alternative parallelisations. This thesis argues that new languages and frameworks are needed. These language and frameworks must not only support high-level parallel programming constructs, but must also provide predictable cost models for these parallel constructs. Moreover, they need to be built around solid, well-understood theories that ensure that: (a) changes to the source code will not change the functional behaviour of a program, and (b) the speedup obtained by doing the necessary changes is predictable. Algorithmic skeletons are parametric implementations of common patterns of parallelism that provide good abstractions for creating new high-level languages, and also support frameworks for parallel computing that satisfy the correctness and predictability requirements that we require. This thesis presents a new type-based framework, based on the connection between structured parallelism and structured patterns of recursion, that provides parallel structures as type abstractions that can be used to statically parallelise a program. Specifically, this thesis exploits hylomorphisms as a single, unifying construct to represent the functional behaviour of parallel programs, and to perform correct code rewritings between alternative parallel implementations, represented as algorithmic skeletons. This thesis also defines a mechanism for deriving cost models for parallel constructs from a queue-based operational semantics. In this way, we can provide strong static guarantees about the correctness of a parallel program, while simultaneously achieving predictable speedups.“This work was supported by the University of St Andrews (School of Computer Science); by the EU FP7 grant “ParaPhrase:Parallel Patterns Adaptive Heterogeneous Multicore Systems” (n. 288570); by the EU H2020 grant “RePhrase: Refactoring Parallel Heterogeneous Resource-Aware Applications - a Software Engineering Approach” (ICT-644235), by COST Action IC1202 (TACLe), supported by COST (European Cooperation Science and Technology); and by EPSRC grant “Discovery: Pattern Discovery and Program Shaping for Manycore Systems” (EP/P020631/1)” -- Acknowledgement

    Categorization And Visualization Of Parallel Programming Systems

    Get PDF
    Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2005Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2005Yükesek kazanımlı programlama olarak da bilinen paralel programlama, bir problemi daha hızlı çözmek için aynı anda birden çok işlemci kullanılmasına denir. Günümüzde, ağır işlemler içeren birçok problem paralel olarak uygulanmaya çalışılmaktadır, buna örnek olarak nehir sularının simüle edilmesi, fizik veya kimya problemleri, astrolojik simülasyonlar verilebilir. Bu tezin amacı, bilimsel hesaplama veya mühendislik amaçlı kullanılan yüksek kazanımlı yazılımları tartışmaktır. Paralel programlama sistemleri ile kastedilen kütüphaneler, diller, derleyiciler, derleyici yönlendiricileri veya bunun dışında kalan, programcının paralel algoritmasını ifade edebileceği yapılardır. Yükesek kazanımlı program tasarımı için programcının dikkat etmesi gereken iki önemli nokta vardır: problemi iyi kavrayıp uygun bir çözüm önermek, doğru sisteme karar verebilmek. Doğru karar verebilmek için kullanıcının sistemler hakkında oldukça iyi bilgiye sahip olması gerekir. Bazen, birden çok yazılım ve donanımı bir arada kullanmak da gerekebilir. Bu tezde var olan paralel programlama sistemleri tanımlanır ve sınıflandırılır, bunun için güncel bildiriler esas alınmıştır. Özellikle algoritmik taslaklar ve fonsiyonel paralel programlama üzerinde durulmuştur.Ayrica güncel bilgileri depolamak ve bir kaynak yaratmak için wiki temelli bir web kaynağı oluşturulmuştur. Sistemlerin grafik gösterimini sağlayıp daha anlaşılır bir sınıflandırma yapabilmek için yeni bir sözdizimi tasarlanıp dinamik ağ çizebilecek webdot aracı ile bir araya getirilerek sistemleri temsil edecek ağı çizecek araç geliştirilmiştir. Bu sözdiziminin öğrenilmesi ve kullanılması son derece kolaydır. Son olarak iki temel paralel programlama tipi, paylaşılan bellek ve mesajlaşma, iki farklı tipte algoritma kullanılarak karşılaştırılmıştır. Programlar OpenMP ve MPI ile gerçeklenmiştir, farklı paralel makinelerde koşturulup sonuçları karşılaştırılmıştır. Paralel makineler için Almanya nın Aachen Üniversitesi nin SMP ağı ve Ulakbim in dağıtık bellekli paralel makineleri kullanılmıştır.Parallel computing, also called high-performance computing, refers to solving problems faster by using multiple processors simultaneously. Nowadays, almost every computationally-intensive problem that one could imagine is tried to be implemented in parallel. This thesis is aimed at discussing high-performance software for scientific or engineering applications. The term parallel programming systems here means libraries, languages, compiler directives or other means through which a programmer can express a parallel algorithm. To design high performance programs, there are two keys for the programmer: to understand the problem and find a solution for parallelization, and to decide on the right system for the implementation, which requires a good knowledge about existing parallel programming systems. The programmer, after having understood the problem, has to choose between many systems, some of which are closely related, whereas others have big differences. This thesis describes and classifies existing parallel programming systems, thus bringing existing surveys up to date. It describes a wiki-based web portal for collecting information about most recent systems, which has been developed as part of the thesis. A special syntax and a visualization tool has been developed. This syntax and tool allow users to have their own categorization scheme. Fourth, it compares two major programming styles message passing and shared memory with two different algorithms in order show performance differences of these styles. Algorithms are implemented in OpenMP and MPI, performance of both programs are measured on the SMP Cluster of Aachen University, Germany and on the Beowulf Cluster of Ulakbim, Ankara.Yüksek LisansM.Sc
    corecore