18 research outputs found

    Structured arrows : a type-based framework for structured parallelism

    Get PDF
    This thesis deals with the important problem of parallelising sequential code. Despite the importance of parallelism in modern computing, writing parallel software still relies on many low-level and often error-prone approaches. These low-level approaches can lead to serious execution problems such as deadlocks and race conditions. Due to the non-deterministic behaviour of most parallel programs, testing parallel software can be both tedious and time-consuming. A way of providing guarantees of correctness for parallel programs would therefore provide significant benefit. Moreover, even if we ignore the problem of correctness, achieving good speedups is not straightforward, since this generally involves rewriting a program to consider a (possibly large) number of alternative parallelisations. This thesis argues that new languages and frameworks are needed. These language and frameworks must not only support high-level parallel programming constructs, but must also provide predictable cost models for these parallel constructs. Moreover, they need to be built around solid, well-understood theories that ensure that: (a) changes to the source code will not change the functional behaviour of a program, and (b) the speedup obtained by doing the necessary changes is predictable. Algorithmic skeletons are parametric implementations of common patterns of parallelism that provide good abstractions for creating new high-level languages, and also support frameworks for parallel computing that satisfy the correctness and predictability requirements that we require. This thesis presents a new type-based framework, based on the connection between structured parallelism and structured patterns of recursion, that provides parallel structures as type abstractions that can be used to statically parallelise a program. Specifically, this thesis exploits hylomorphisms as a single, unifying construct to represent the functional behaviour of parallel programs, and to perform correct code rewritings between alternative parallel implementations, represented as algorithmic skeletons. This thesis also defines a mechanism for deriving cost models for parallel constructs from a queue-based operational semantics. In this way, we can provide strong static guarantees about the correctness of a parallel program, while simultaneously achieving predictable speedups.“This work was supported by the University of St Andrews (School of Computer Science); by the EU FP7 grant “ParaPhrase:Parallel Patterns Adaptive Heterogeneous Multicore Systems” (n. 288570); by the EU H2020 grant “RePhrase: Refactoring Parallel Heterogeneous Resource-Aware Applications - a Software Engineering Approach” (ICT-644235), by COST Action IC1202 (TACLe), supported by COST (European Cooperation Science and Technology); and by EPSRC grant “Discovery: Pattern Discovery and Program Shaping for Manycore Systems” (EP/P020631/1)” -- Acknowledgement

    Progress Report : 1991 - 1994

    Get PDF

    Parallel programming using functional languages

    Get PDF
    It has been argued for many years that functional programs are well suited to parallel evaluation. This thesis investigates this claim from a programming perspective; that is, it investigates parallel programming using functional languages. The approach taken has been to determine the minimum programming which is necessary in order to write efficient parallel programs. This has been attempted without the aid of clever compile-time analyses. It is argued that parallel evaluation should be explicitly expressed, by the programmer, in programs. To do achieve this a lazy functional language is extended with parallel and sequential combinators. The mathematical nature of functional languages means that programs can be formally derived by program transformation. To date, most work on program derivation has concerned sequential programs. In this thesis Squigol has been used to derive three parallel algorithms. Squigol is a functional calculus from program derivation, which is becoming increasingly popular. It is shown that some aspects of Squigol are suitable for parallel program derivation, while others aspects are specifically orientated towards sequential algorithm derivation. In order to write efficient parallel programs, parallelism must be controlled. Parallelism must be controlled in order to limit storage usage, the number of tasks and the minimum size of tasks. In particular over-eager evaluation or generating excessive numbers of tasks can consume too much storage. Also, tasks can be too small to be worth evaluating in parallel. Several program techniques for parallelism control were tried. These were compared with a run-time system heuristic for parallelism control. It was discovered that the best control was effected by a combination of run-time system and programmer control of parallelism. One of the problems with parallel programming using functional languages is that non-deterministic algorithms cannot be expressed. A bag (multiset) data type is proposed to allow a limited form of non-determinism to be expressed. Bags can be given a non-deterministic parallel implementation. However, providing the operations used to combine bag elements are associative and commutative, the result of bag operations will be deterministic. The onus is on the programmer to prove this, but usually this is not difficult. Also bags' insensitivity to ordering means that more transformations are directly applicable than if, say, lists were used instead. It is necessary to be able to reason about and measure the performance of parallel programs. For example, sometimes algorithms which seem intuitively to be good parallel ones, are not. For some higher order functions it is posible to devise parameterised formulae describing their performance. This is done for divide and conquer functions, which enables constraints to be formulated which guarantee that they have a good performance. Pipelined parallelism is difficult to analyse. Therefore a formal semantics for calculating the performance of pipelined programs is devised. This is used to analyse the performance of a pipelined Quicksort. By treating the performance semantics as a set of transformation rules, the simulation of parallel programs may be achieved by transforming programs. Some parallel programs perform poorly due to programming errors. A pragmatic method of debugging such programming errors is illustrated by some examples

    Developing and Measuring Parallel Rule-Based Systems in a Functional Programming Environment

    Get PDF
    This thesis investigates the suitability of using functional programming for building parallel rule-based systems. A functional version of the well known rule-based system OPS5 was implemented, and there is a discussion on the suitability of functional languages for both building compilers and manipulating state. Functional languages can be used to build compilers that reflect the structure of the original grammar of a language and are, therefore, very suitable. Particular attention is paid to the state requirements and the state manipulation structures of applications such as a rule-based system because, traditionally, functional languages have been considered unable to manipulate state. From the implementation work, issues have arisen that are important for functional programming as a whole. They are in the areas of algorithms and data structures and development environments. There is a more general discussion of state and state manipulation in functional programs and how theoretical work, such as monads, can be used. Techniques for how descriptions of graph algorithms may be interpreted more abstractly to build functional graph algorithms are presented. Beyond the scope of programming, there are issues relating both to the functional language interaction with the operating system and to tools, such as debugging and measurement tools, which help programmers write efficient programs. In both of these areas functional systems are lacking. To address the complete lack of measurement tools for functional languages, a profiling technique was designed which can accurately measure the number of calls to a function , the time spent in a function, and the amount of heap space used by a function. From this design, a profiler was developed for higher-order, lazy, functional languages which allows the programmer to measure and verify the behaviour of a program. This profiling technique is designed primarily for application programmers rather than functional language implementors, and the results presented by the profiler directly reflect the lexical scope of the original program rather than some run-time representation. Finally, there is a discussion of generally available techniques for parallelizing functional programs in order that they may execute on a parallel machine. The techniques which are easier for the parallel systems builder to implement are shown to be least suitable for large functional applications. Those techniques that best suit functional programmers are not yet generally available and usable

    A parallel functional language compiler for message-passing multicomputers

    Get PDF
    The research presented in this thesis is about the design and implementation of Naira, a parallel, parallelising compiler for a rich, purely functional programming language. The source language of the compiler is a subset of Haskell 1.2. The front end of Naira is written entirely in the Haskell subset being compiled. Naira has been successfully parallelised and it is the largest successfully parallelised Haskell program having achieved good absolute speedups on a network of SUN workstations. Having the same basic structure as other production compilers of functional languages, Naira's parallelisation technology should carry forward to other functional language compilers. The back end of Naira is written in C and generates parallel code in the C language which is envisioned to be run on distributed-memory machines. The code generator is based on a novel compilation scheme specified using a restricted form of Milner's 7r-calculus which achieves asynchronous communication. We present the first working implementation of this scheme on distributed-memory message-passing multicomputers with split-phase transactions. Simulated assessment of the generated parallel code indicates good parallel behaviour. Parallelism is introduced using explicit, advisory user annotations in the source' program and there are two major aspects of the use of annotations in the compiler. First, the front end of the compiler is parallelised so as to improve its efficiency at compilation time when it is compiling input programs. Secondly, the input programs to the compiler can themselves contain annotations based on which the compiler generates the multi-threaded parallel code. These, therefore, make Naira, unusually and uniquely, both a parallel and a parallelising compiler. We adopt a medium-grained approach to granularity where function applications form the unit of parallelism and load distribution. We have experimented with two different task distribution strategies, deterministic and random, and have also experimented with thread-based and quantum- based scheduling policies. Our experiments show that there is little efficiency difference for regular programs but the quantum-based scheduler is the best in programs with irregular parallelism. The compiler has been successfully built, parallelised and assessed using both idealised and realistic measurement tools: we obtained significant compilation speed-ups on a variety of simulated parallel architectures. The simulated results are supported by the best results obtained on real hardware for such a large program: we measured an absolute speedup of 2.5 on a network of 5 SUN workstations. The compiler has also been shown to have good parallelising potential, based on popular test programs. Results of assessing Naira's generated unoptimised parallel code are comparable to those produced by other successful parallel implementation projects

    The design and implementation of a multiparadigm programming language.

    Get PDF
    by Chi-keung Luk.Thesis (M.Phil.)--Chinese University of Hong Kong, 1993.Includes bibliographical references (leaves 169-174).Preface --- p.xiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Programming Languages --- p.2Chapter 1.2 --- Programming Paradigms --- p.2Chapter 1.2.1 --- What is a programming paradigm --- p.2Chapter 1.2.2 --- Which came first? Languages or paradigms? --- p.2Chapter 1.2.3 --- Overview of some paradigms --- p.4Chapter 1.2.4 --- A spectrum of paradigms --- p.6Chapter 1.2.5 --- Mulitparadigm systems --- p.7Chapter 1.3 --- The Objectives of this research --- p.8Chapter 2 --- "Studies of the object-oriented, the logic and the functional paradigms" --- p.10Chapter 2.1 --- The Object-Oriented Paradigm --- p.10Chapter 2.1.1 --- Basic components --- p.10Chapter 2.1.2 --- Motivations --- p.11Chapter 2.1.3 --- Some related issues --- p.12Chapter 2.1.4 --- Computational models for object-oriented programming --- p.16Chapter 2.2 --- The Functional Paradigm --- p.18Chapter 2.2.1 --- Basic concepts --- p.18Chapter 2.2.2 --- Lambda calculus --- p.20Chapter 2.2.3 --- The characteristics of functional programs --- p.21Chapter 2.2.4 --- Practicality of functional programming --- p.25Chapter 2.3 --- The Logic Paradigm --- p.28Chapter 2.3.1 --- Relations --- p.28Chapter 2.3.2 --- Logic programs --- p.29Chapter 2.3.3 --- The opportunity for parallelism --- p.30Chapter 2.4 --- Summary --- p.31Chapter 3 --- A survey of some existing multiparadigm languages --- p.32Chapter 3.1 --- Logic + Object-Oriented --- p.33Chapter 3.1.1 --- LogiC++ --- p.33Chapter 3.1.2 --- Intermission --- p.34Chapter 3.1.3 --- Object-Oriented Programming in Prolog (OOPP) --- p.36Chapter 3.1.4 --- Communication Prolog Unit (CPU) --- p.37Chapter 3.1.5 --- DLP --- p.37Chapter 3.1.6 --- Representing Objects in a Logic Programming Language with Scoping Constructs (OLPSC) --- p.39Chapter 3.1.7 --- KSL/Logic --- p.40Chapter 3.1.8 --- Orient84/K --- p.41Chapter 3.1.9 --- Vulcan --- p.42Chapter 3.1.10 --- The Bridge approach --- p.43Chapter 3.1.11 --- Discussion --- p.44Chapter 3.2 --- Functional + Object-Oriented --- p.46Chapter 3.2.1 --- PROOF --- p.46Chapter 3.2.2 --- A Functional Language with Classes (FLC) --- p.47Chapter 3.2.3 --- Common Lisp Object System (CLOS) --- p.49Chapter 3.2.4 --- FOOPS --- p.50Chapter 3.2.5 --- Discussion --- p.51Chapter 3.3 --- Logic + Functional --- p.52Chapter 3.3.1 --- HOPE --- p.52Chapter 3.3.2 --- FUNLOG --- p.54Chapter 3.3.3 --- F* --- p.55Chapter 3.3.4 --- LEAF --- p.56Chapter 3.3.5 --- Applog --- p.57Chapter 3.3.6 --- Discussion --- p.58Chapter 3.4 --- Logic + Functional + Object-Oriented --- p.61Chapter 3.4.1 --- Paradise --- p.61Chapter 3.4.2 --- LIFE --- p.62Chapter 3.4.3 --- UNIFORM --- p.63Chapter 3.4.4 --- G --- p.64Chapter 3.4.5 --- FOOPlog --- p.66Chapter 3.4.6 --- Logic and Objects (L&O) --- p.66Chapter 3.4.7 --- Discussion --- p.67Chapter 4 --- The design of a multiparadigm language I --- p.70Chapter 4.1 --- An Object-Oriented Framework --- p.71Chapter 4.1.1 --- A hierarchy of classes --- p.71Chapter 4.1.2 --- Program structure --- p.71Chapter 4.1.3 --- Parametric classes --- p.72Chapter 4.1.4 --- Inheritance --- p.73Chapter 4.1.5 --- The meanings of classes and methods --- p.75Chapter 4.1.6 --- Objects and messages --- p.75Chapter 4.2 --- The logic Subclasses --- p.76Chapter 4.2.1 --- Syntax --- p.76Chapter 4.2.2 --- Distributed inference --- p.76Chapter 4.2.3 --- Adding functions and expressions to logic programs --- p.77Chapter 4.2.4 --- State modelling --- p.79Chapter 4.3 --- The functional Subclasses --- p.80Chapter 4.3.1 --- The syntax of functions --- p.80Chapter 4.3.2 --- Abstract data types --- p.81Chapter 4.3.3 --- Augmented list comprehensions --- p.82Chapter 4.4 --- The Semantic Foundation of I Programs --- p.84Chapter 4.4.1 --- T1* : Transform functions into Horn clauses --- p.84Chapter 4.4.2 --- T2*: Transform object-oriented features into pure logic --- p.85Chapter 4.5 --- Exploiting Parallelism in I Programs --- p.89Chapter 4.5.1 --- Inter-object parallelism --- p.89Chapter 4.5.2 --- Intra-object parallelism --- p.92Chapter 4.6 --- Discussion --- p.96Chapter 5 --- An implementation of a prototype of I --- p.99Chapter 5.1 --- System Overview --- p.99Chapter 5.2 --- I-to-Prolog Translation --- p.101Chapter 5.2.1 --- Pass 1 - lexical and syntax analysis --- p.101Chapter 5.2.2 --- Pass 2 - Class Table Construction and Semantic Checking --- p.101Chapter 5.2.3 --- Pass 3 - Determination of Multiple Inheritance Precedence --- p.105Chapter 5.2.4 --- Pass 4 - Translation of the directive part --- p.110Chapter 5.2.5 --- Pass 5 - Creation of Prolog source code for an I object --- p.110Chapter 5.2.6 --- Using expressions in logic methods --- p.112Chapter 5.3 --- I-to-LML Translation --- p.114Chapter 5.4 --- The Run-time Handler --- p.117Chapter 5.4.1 --- Object Management --- p.118Chapter 5.4.2 --- Process Management and Message Passing --- p.121Chapter 6 --- Some applications written in I --- p.125Chapter 6.1 --- Modeling of a State Space Search --- p.125Chapter 6.2 --- A Solution to the N-queen Problem --- p.129Chapter 6.3 --- Object-Oriented Modeling of a Database --- p.131Chapter 6.4 --- A Simple Expert System --- p.133Chapter 6.5 --- Summary --- p.138Chapter 7 --- Conclusion and future work --- p.139Chapter 7.1 --- Conclusion --- p.139Chapter 7.2 --- Future Work --- p.141Chapter A --- Language manual --- p.146Chapter A.1 --- Introduction --- p.146Chapter A.2 --- Syntax --- p.146Chapter A.2.1 --- The lexical specification --- p.146Chapter A.2.2 --- The syntax specification --- p.149Chapter A3 --- Classes --- p.152Chapter A.4 --- Object Creation and Method Invocation --- p.153Chapter A.5 --- The logic Subclasses --- p.155Chapter A.6 --- The functional Subclasses --- p.156Chapter A.7 --- Types --- p.158Chapter A.8 --- Mutable States --- p.158Chapter B --- User's guide --- p.160Chapter B.1 --- System Calls --- p.160Chapter B.2 --- Configuration Parameters --- p.162Chapter B.3 --- Errors --- p.163Chapter B.4 --- Implementation Limits --- p.164Chapter B.5 --- How to install the system --- p.164Chapter B.6 --- How to use the system --- p.164Chapter B.7 --- How to recompile the system --- p.166Chapter B.8 --- Directory arrangement --- p.167Chapter C --- List of publications --- p.168Bibliography --- p.16

    Programmiersprachen und Rechenkonzepte

    Get PDF
    Seit 1984 veranstaltet die GI-Fachgruppe "Programmiersprachen und Rechenkonzepte", die aus den ehemaligen Fachgruppen 2.1.3 "Implementierung von Programmiersprachen" und 2.1.4 "Alternative Konzepte für Sprachen und Rechner" hervorgegangen ist, regelmäßig im Frühjahr einen Workshop im Physikzentrum Bad Honnef. Das Treffen dient in erster Linie dem gegenseitigen Kennenlernen, dem Erfahrungsaustausch, der Diskussion und der Vertiefung gegenseitiger Kontakte

    Adaptive architecture-transparent policy control in a distributed graph reducer

    Get PDF
    The end of the frequency scaling era occured around 2005 as the clock frequency has stalled for commodity architectures. Thus performance improvements that could in the past be expected with each new hardware generation needed to originate elsewhere. Almost all computer architectures exhibit substantial and growing levels of parallelism, exploiting which became one of the key sources of performance and scalability improvements. Alas, parallel programming proved much more difficult than sequential, due to the need to specify coordination and parallelism management aspects. Whilst low-level languages place the burden on the programmers reducing productivity and portability, semi-implicit approaches delegate the responsibility to sophisticated compilers and run-time systems. This thesis presents a study of adaptive load distribution based on work stealing using history and ancestry information in a distributed graph reducer for a nonstrict functional language. The results contribute to the exploration of more flexible run-time-system-level parallelism control implementing a semi-explicit model of parallelism, which offers productivity and high level of abstraction by delegating the responsibility of coordination to the run-time system. After characterising a set of parallel functional applications, we study the use of historical information to adapt the choice of the victim to steal from in a work stealing scheduler. We observe substantially lower numbers of messages for data-parallel and nested applications. However, this heuristic fails in cases where past application behaviour is not resembling future behaviour, for instance for Divide-&-Conquer applications with a large number of very fine-grained threads and generators of parallelism that move dynamically across processing elements. This mechanism is not specific to the language and the run-time system, and applies to other work stealing schedulers. Next, we focus on the other key work stealing decision of which sparks that represent potential parallelism to donate, investigating the effect of Spark Colocation on the performance of five Divide-&-Conquer programs run on a cluster of up to 256 PEs. When using Spark Colocation, the distributed graph reducer shares related work resulting in a higher degree of both potential and actual parallelism, and more fine-grained and less variable thread size. We validate this behaviour by observing a reduction in average fetch times, but increased amounts of FETCH messages and of inter-PE pointers for colocation, which nevertheless results in improved load balance for three of the five benchmark programs. The results show high speedups and speedup improvements for Spark Colocation for the three more regular and nested applications and performance degradation for two programs: one that is excessively fine-grained and one exhibiting limited scalability. Overall, Spark Colocation appears most beneficial for higher numbers of PEs, where improved load balance and higher degree of parallelism have more opportunities to pay off. In more general terms, we show that a run-time system can beneficially use historical information on past stealing successes that is gathered dynamically and used within the same run and the ancestry information dynamically reconstructed at run time using annotations. Moreover, the results support the view that different heuristics are beneficial for applications using different parallelism patterns, underlining the advantages of a flexible architecture-transparent approach.The Scottish Informatics and Computer Science Alliance (SICSA
    corecore