201 research outputs found
Parallel evaluation strategies for lazy data structures in Haskell
Conventional parallel programming is complex and error prone. To improve programmer
productivity, we need to raise the level of abstraction with a higher-level
programming model that hides many parallel coordination aspects. Evaluation
strategies use non-strictness to separate the coordination and computation aspects
of a Glasgow parallel Haskell (GpH) program. This allows the specification of high
level parallel programs, eliminating the low-level complexity of synchronisation and
communication associated with parallel programming.
This thesis employs a data-structure-driven approach for parallelism derived through
generic parallel traversal and evaluation of sub-components of data structures. We
focus on evaluation strategies over list, tree and graph data structures, allowing
re-use across applications with minimal changes to the sequential algorithm.
In particular, we develop novel evaluation strategies for tree data structures, using
core functional programming techniques for coordination control, achieving more
flexible parallelism. We use non-strictness to control parallelism more flexibly. We
apply the notion of fuel as a resource that dictates parallelism generation, in particular,
the bi-directional flow of fuel, implemented using a circular program definition,
in a tree structure as a novel way of controlling parallel evaluation. This is the first
use of circular programming in evaluation strategies and is complemented by a lazy
function for bounding the size of sub-trees.
We extend these control mechanisms to graph structures and demonstrate performance
improvements on several parallel graph traversals. We combine circularity
for control for improved performance of strategies with circularity for computation
using circular data structures. In particular, we develop a hybrid traversal strategy
for graphs, exploiting breadth-first order for exposing parallelism initially, and
then proceeding with a depth-first order to minimise overhead associated with a full
parallel breadth-first traversal.
The efficiency of the tree strategies is evaluated on a benchmark program, and
two non-trivial case studies: a Barnes-Hut algorithm for the n-body problem and
sparse matrix multiplication, both using quad-trees. We also evaluate a graph search
algorithm implemented using the various traversal strategies.
We demonstrate improved performance on a server-class multicore machine with
up to 48 cores, with the advanced fuel splitting mechanisms proving to be more
flexible in throttling parallelism. To guide the behaviour of the strategies, we develop
heuristics-based parameter selection to select their specific control parameters
Preliminary proceedings of the 2001 ACM SIGPLAN Haskell workshop
This volume contains the preliminary proceedings of the 2001 ACM SIGPLAN Haskell Workshop,
which was held on 2nd September 2001 in Firenze, Italy. The final proceedings will
published by Elsevier Science as an issue of Electronic Notes in Theoretical Computer Science
(Volume 59).
The HaskellWorkshop was sponsored by ACM SIGPLAN and formed part of the PLI 2001
colloquium on Principles, Logics, and Implementations of high-level programming languages,
which comprised the ICFP/PPDP conferences and associated workshops. Previous Haskell
Workshops have been held in La Jolla (1995), Amsterdam (1997), Paris (1999), and MontrÂŽeal
(2000).
The purpose of the Haskell Workshop was to discuss experience with Haskell, and possible
future developments for the language. The scope of the workshop included all aspects of the
design, semantics, theory, application, implementation, and teaching of Haskell. Submissions
that discussed limitations of Haskell at present and/or proposed new ideas for future versions
of Haskell were particularly encouraged. Adopting an idea from ICFP 2000, the workshop also
solicited two special classes of submissions, application letters and functional pearls, described
below
Implementation and Evaluation of Algorithmic Skeletons: Parallelisation of Computer Algebra Algorithms
This thesis presents design and implementation approaches for the parallel algorithms of computer algebra. We use algorithmic skeletons and also further approaches, like data parallel arithmetic and actors. We have implemented skeletons for divide and conquer algorithms and some special parallel loops, that we call ârepeated computation with a possibility of premature terminationâ. We introduce in this thesis a rational data parallel arithmetic. We focus on parallel symbolic computation algorithms, for these algorithms our arithmetic provides a generic parallelisation approach.
The implementation is carried out in Eden, a parallel functional programming language based on Haskell. This choice enables us to encode both the skeletons and the programs in the same language. Moreover, it allows us to refrain from using two different languagesâone for the implementation and one for the interfaceâfor our implementation of computer algebra algorithms.
Further, this thesis presents methods for evaluation and estimation of parallel execution times. We partition the parallel execution time into two components. One of them accounts for the quality of the parallelisation, we call it the âparallel penaltyâ. The other is the sequential execution time. For the estimation, we predict both components separately, using statistical methods. This enables very confident estimations, although using drastically less measurement points than other methods. We have applied both our evaluation and estimation approaches to the parallel programs presented in this thesis. We haven also used existing estimation methods.
We developed divide and conquer skeletons for the implementation of fast parallel multiplication. We have implemented the Karatsuba algorithm, Strassenâs matrix multiplication algorithm and the fast Fourier transform. The latter was used to implement polynomial convolution that leads to a further fast multiplication algorithm. Specially for our implementation of Strassen algorithm we have designed and implemented a divide and conquer skeleton basing on actors. We have implemented the parallel fast Fourier transform, and not only did we use new divide and conquer skeletons, but also developed a map-and-transpose skeleton. It enables good parallelisation of the Fourier transform. The parallelisation of Karatsuba multiplication shows a very good performance. We have analysed the parallel penalty of our programs and compared it to the serial fractionâan approach, known from literature. We also performed execution time estimations of our divide and conquer programs.
This thesis presents a parallel map+reduce skeleton scheme. It allows us to combine the usual parallel map skeletons, like parMap, farm, workpool, with a premature termination property. We use this to implement the so-called âparallel repeated computationâ, a special form of a speculative parallel loop. We have implemented two probabilistic primality tests: the RabinâMiller test and the Jacobi sum test. We parallelised both with our approach. We analysed the task distribution and stated the fitting configurations of the Jacobi sum test. We have shown formally that the Jacobi sum test can be implemented in parallel. Subsequently, we parallelised it, analysed the load balancing issues, and produced an optimisation. The latter enabled a good implementation, as verified using the parallel penalty. We have also estimated the performance of the tests for further input sizes and numbers of processing elements. Parallelisation of the Jacobi sum test and our generic parallelisation scheme for the repeated computation is our original contribution.
The data parallel arithmetic was defined not only for integers, which is already known, but also for rationals. We handled the common factors of the numerator or denominator of the fraction with the modulus in a novel manner. This is required to obtain a true multiple-residue arithmetic, a novel result of our research. Using these mathematical advances, we have parallelised the determinant computation using the GauĂ elimination. As always, we have performed task distribution analysis and estimation of the parallel execution time of our implementation. A similar computation in Maple emphasised the potential of our approach. Data parallel arithmetic enables parallelisation of entire classes of computer algebra algorithms.
Summarising, this thesis presents and thoroughly evaluates new and existing design decisions for high-level parallelisations of computer algebra algorithms
Recommended from our members
Compiling Irregular Software to Specialized Hardware
High-level synthesis (HLS) has simplified the design process for energy-efficient hardware accelerators: a designer specifies an acceleratorâs behavior in a âhigh-levelâ language, and a toolchain synthesizes register-transfer level (RTL) code from this specification. Many HLS systems produce efficient hardware designs for regular algorithms (i.e., those with limited conditionals or regular memory access patterns), but most struggle with irregular algorithms that rely on dynamic, data-dependent memory access patterns (e.g., traversing pointer-based structures like lists, trees, or graphs). HLS tools typically provide imperative, side-effectful languages to the designer, which makes it difficult to correctly specify and optimize complex, memory-bound applications.
In this dissertation, I present an alternative HLS methodology that leverages properties of functional languages to synthesize hardware for irregular algorithms. The main contribution is an optimizing compiler that translates pure functional programs into modular, parallel dataflow networks in hardware. I give an overview of this compiler, explain how its source and target together enable parallelism in the face of irregularity, and present two specific optimizations that further exploit this parallelism. Taken together, this dissertation verifies my thesis that pure functional programs exhibiting irregular memory access patterns can be compiled into specialized hardware and optimized for parallelism.
This work extends the scope of modern HLS toolchains. By relying on properties of pure functional languages, our compiler can synthesize hardware from programs containing constructs that commercial HLS tools prohibit, e.g., recursive functions and dynamic memory allocation. Hardware designers may thus use our compiler in conjunction with existing HLS systems to accelerate a wider class of algorithms than before
Adaptive architecture-transparent policy control in a distributed graph reducer
The end of the frequency scaling era occured around 2005 as the clock frequency
has stalled for commodity architectures. Thus performance improvements that could
in the past be expected with each new hardware generation needed to originate
elsewhere. Almost all computer architectures exhibit substantial and growing levels
of parallelism, exploiting which became one of the key sources of performance and
scalability improvements. Alas, parallel programming proved much more difficult
than sequential, due to the need to specify coordination and parallelism management
aspects. Whilst low-level languages place the burden on the programmers reducing
productivity and portability, semi-implicit approaches delegate the responsibility to
sophisticated compilers and run-time systems.
This thesis presents a study of adaptive load distribution based on work stealing
using history and ancestry information in a distributed graph reducer for a nonstrict functional language. The results contribute to the exploration of more flexible
run-time-system-level parallelism control implementing a semi-explicit model of parallelism, which offers productivity and high level of abstraction by delegating the
responsibility of coordination to the run-time system.
After characterising a set of parallel functional applications, we study the use of
historical information to adapt the choice of the victim to steal from in a work stealing scheduler. We observe substantially lower numbers of messages for data-parallel
and nested applications. However, this heuristic fails in cases where past application behaviour is not resembling future behaviour, for instance for Divide-&-Conquer
applications with a large number of very fine-grained threads and generators of parallelism that move dynamically across processing elements. This mechanism is not
specific to the language and the run-time system, and applies to other work stealing
schedulers.
Next, we focus on the other key work stealing decision of which sparks that represent potential parallelism to donate, investigating the effect of Spark Colocation
on the performance of five Divide-&-Conquer programs run on a cluster of up to
256 PEs. When using Spark Colocation, the distributed graph reducer shares related
work resulting in a higher degree of both potential and actual parallelism, and more
fine-grained and less variable thread size. We validate this behaviour by observing
a reduction in average fetch times, but increased amounts of FETCH messages and
of inter-PE pointers for colocation, which nevertheless results in improved load balance for three of the five benchmark programs. The results show high speedups and
speedup improvements for Spark Colocation for the three more regular and nested
applications and performance degradation for two programs: one that is excessively
fine-grained and one exhibiting limited scalability. Overall, Spark Colocation appears most beneficial for higher numbers of PEs, where improved load balance and
higher degree of parallelism have more opportunities to pay off.
In more general terms, we show that a run-time system can beneficially use historical information on past stealing successes that is gathered dynamically and used
within the same run and the ancestry information dynamically reconstructed at run
time using annotations. Moreover, the results support the view that different heuristics are beneficial for applications using different parallelism patterns, underlining
the advantages of a flexible architecture-transparent approach.The Scottish Informatics and Computer Science Alliance (SICSA
Low-Level Haskell Code: Measurements and Optimization Techniques
Haskell is a lazy functional language with a strong static type system and
excellent support for parallel programming. The language features of Haskell
make it easier to write correct and maintainable programs, but execution speed
often suffers from the high levels of abstraction. While much past research
focuses on high-level optimizations that take advantage of the functional
properties of Haskell, relatively little attention has been paid to the
optimization opportunities in the low-level imperative code generated during
translation to machine code. One problem with current low-level optimizations
is that their effectiveness is limited by the obscured control flow caused by
Haskell's high-level abstractions. My thesis is that trace-based optimization
techniques can be used to improve the effectiveness of low-level optimizations
for Haskell programs. I claim three unique contributions in this work.
The first contribution is to expose some properties of low-level Haskell codes
by looking at the mix of operations performed by the selected benchmark codes
and comparing them to the low-level codes coming from traditional programming
languages. The low-level measurements reveal that the control flow is obscured
by indirect jumps caused by the implementation of lazy evaluation,
higher-order functions, and the separately managed stacks used by Haskell
programs.
My second contribution is a study on the effectiveness of a dynamic binary
trace-based optimizer running on Haskell programs. My results show that while
viable program traces frequently occur in Haskell programs the overhead
associated with maintaing the traces in a dynamic optimization system outweigh
the benefits we get from running the traces. To reduce the runtime overheads,
I explore a way to find traces in a separate profiling step.
My final contribution is to build and evaluate a static trace-based optimizer
for Haskell programs. The static optimizer uses profiling data to find traces
in a Haskell program and then restructures the code around the traces to
increase the scope available to the low-level optimizer. My results show that
we can successfully build traces in Haskell programs, and the optimized code
yields a speedup over existing low-level optimizers of up to 86%
with an average speedup of 5% across 32 benchmarks
An Extensible Theorem Proving Frontend
Interaktive Theorembeweiser sind Softwarewerkzeuge zum computergestĂŒtzten Beweisen, d.h. sie können entsprechend kodierte Beweise von logischen Aussagen sowohl verifizieren als auch beim Erstellen dieser unterstĂŒtzen. In den letzten Jahren wurden weitreichende Formalisierungsprojekte ĂŒber Mathematik sowie Programmverifikation mit solchen Theorembeweisern bewĂ€ltigt. Der Theorembeweiser Lean insbesondere wurde nicht nur erfolgreich zum Verifizieren lange bekannter mathematischer Theoreme verwendet, sondern auch zur UnterstĂŒtzung von aktueller mathematischer Forschung. Das Ziel des Lean-Projekts ist nichts weniger als die Arbeitsweise von Mathematikern grundlegend zu verĂ€ndern, indem mit dem Computer formalisierte Beweise eine praktible Alternative zu solchen mit Stift und Papier werden sollen. AufwĂ€ndige manuelle Gutachten zur Korrektheit von Beweisen wĂ€ren damit hinfĂ€llig und gleichzeitig wĂ€re garantiert, dass alle nötigen Beweisschritte exakt erfasst sind, statt der Interpretation und dem Hintergrundwissen des Lesers ĂŒberlassen zu sein. Um dieses Ziel zu erreichen, sind jedoch noch weitere Fortschritte hinsichtlich Effizienz und Nutzbarkeit von Theorembeweisern nötig.
Als Schritt in Richtung dieses Ziels beschreibt diese Dissertation eine neue, vollstĂ€ndig erweiterbare Theorembeweiser-Benutzerschnittstelle ("frontend") im Rahmen von Lean 4, der nĂ€chsten Version von Lean. Aufgabe dieser Benutzerschnittstelle ist die textuelle Beschreibung und Entgegennahme der Beweiseingabe in einer Syntax, die mehrere teils widersprĂŒchliche Ziele optimieren sollte: Kompaktheit, Lesbarkeit fĂŒr menschliche Benutzer und Eindeutigkeit in der Interpretation durch den Theorembeweiser. Da in der geschriebenen Mathematik eine umfangreiche Menge an verschiedenen Notationen existiert, die von Jahr zu Jahr weiter wĂ€chst und sich gleichzeitig zwischen verschiedenen Feldern, Autoren oder sogar einzelnen Arbeiten unterscheiden kann, muss solch eine Schnittstelle es Benutzern erlauben, sie jederzeit mit neuen, ausdrucksfĂ€higen Notationen zu erweitern und ihnen mit flexiblen Regeln Bedeutung zuzuschreiben. Dieser Wunsch nach FlexibilitĂ€t der Eingabesprache lĂ€sst sich weiterhin auch auf der Ebene der einzelnen Beweisschritte ("Taktiken") sowie höheren Ebenen der Beweis- und Programmorganisation wiederfinden.
Den Kernteil dieser gewĂŒnschten Erweiterbarkeit habe ich mit einem ausdrucksstarken Makrosystem fĂŒr Lean realisiert, mit dem sich sowohl einfach Syntaxtransformationen ("syntaktischer Zucker") also auch komplexe, typgesteuerte Ăbersetzung in die Kernsprache des Beweisers ausdrĂŒcken lassen. Das Makrosystem basiert auf einem neuartigen Algorithmus fĂŒr Makrohygiene, basierend auf dem der Lisp-Sprache Racket und von mir an die spezifischen Anforderungen von Theorembeweisern angepasst, dessen Aufgabe es ist zu gewĂ€hrleisten, dass lexikalische Geltungsbereiche von Bezeichnern selbst fĂŒr komplexe Makros wie intuitiv erwartet funktionieren. Besonders habe ich beim Entwurf des Makrosystems darauf geachtet, das System einfach zugĂ€nglich zu gestalten, indem mehrere Abstraktionsebenen bereitgestellt werden, die sich in ihrer AusdrucksstĂ€rke unterscheiden, aber auf den gleichen fundamentalen Prinzipien wie der erwĂ€hnten Makrohygiene beruhen. Als ein Anwendungsbeispiel des Makrosystems beschreibe ich eine Erweiterung der aus Haskell bekannten "do"-Notation um weitere imperative Sprachfeatures. Die erweiterte Syntax ist in Lean 4 eingeflossen und hat grundsĂ€tzlich die Art und Weise verĂ€ndert, wie sowohl Entwickler als auch Benutzer monadischen, aber auch puren Code schreiben.
Das Makrosystem stellt das "Herz" des erweiterbaren Frontends dar, ist gleichzeitig aber auch eng mit anderen Softwarekomponenten innerhalb der Benutzerschnittstelle verknĂŒpft oder von ihnen abhĂ€ngig. Ich stelle das gesamte Frontend und das umgebende Lean-System vor mit Fokus auf Teilen, an denen ich maĂgeblich mitgewirkt habe. SchlieĂlich beschreibe ich noch ein effizientes ReferenzzĂ€hlungsschema fĂŒr funktionale Programmierung, welches eine Neuimplementierung von Lean in Lean selbst und damit das erweiterbare Frontend erst ermöglicht hat. Spezifische Optimierungen darin zur Wiederverwendung von Allokationen vereinen, Ă€hnlich wie die erweiterte do-Notation, die Vorteile von imperativer und pur funktionaler Programmierung in einem neuen Paradigma, das ich "pure imperative Programmierung" nenne
Bidirectional Programming and its Applications
Many problems in programming involve pairs of computations that cancel out each otherâs effects; some examples include parsing/printing, embed- ding/projection, marshalling/unmarshalling, compressing/de-compressing etc. To avoid duplication of effort, the paradigm of bidirectional programming aims at to allow the programmer to write a single program that expresses both computations. Despite being a promising idea, existing studies mainly focus on the view-update problem in databases and its variants; and the impact of bidirectional programming has not reached the wider community. The goal of this thesis is to demonstrate, through concrete language designs and case studies, the relevance of bidirectional programming, in areas of computer science that have not been previously explored.
In this thesis, we will argue for the importance of bidirectional programming in programming language design and compiler implementation. As evidence for this, we will propose a technique for incremental refactoring, which relies for its correctness on a bidirectional language and its properties, and devise a framework for implementing program transformations, with bidirectional properties that allow program analyses to be carried out in the transformed program, and have the results reported in the source program.
Our applications of bidirectional programming to new areas bring up fresh challenges. This thesis also reflects on the challenges, and studies their impact to the design of bidirectional systems. We will review various design goals, including expressiveness, robustness, updatability, efficiency and easy of use, and show how certain choices, especially regarding updatability, can have significant influence on the effectiveness of bidirectional systems
Donât Mind The Formalization Gap: The Design And Usage Of Hs-To-Coq
Using proof assistants to perform formal, mechanical software verification is a powerful technique for producing correct software. However, the verification is time-consuming and limited to software written in the language of the proof assistant. As an approach to mitigating this tradeoff, this dissertation presents hs-to-coq, a tool for translating programs written in the Haskell programming language into the Coq proof assistant, along with its applications and a general methodology for using it to verify programs. By introducing edit files containing programmatic descriptions of code transformations, we provide the ability to flexibly adapt our verification goals to exist anywhere on the spectrum between âincreased confidenceâ and âfull functional correctnessâ
Advanced Language-based Techniques for Correct, Secure Networked Systems
Developing correct and secure software is an important task that impacts many areas including finance, transportation, health, and defense. In order to develop secure programs, it is critical to understand the factors that influence the introduction of vulnerable code. To investigate, we ran the Build-it, Break-it, Fix-it (BIBIFI) contest as a quasi-controlled experiment. BIBIFI aims to assess the ability to securely build software, not just break it. In BIBIFI, teams build specified software with the goal of maximizing correctness, performance, and security. The latter is tested when teams attempt to break other teamsâ submissions. Winners are chosen from among the best builders and the best breakers. BIBIFI was designed to be open-endedâteams can use any language, tool, process, etc. that they like. As such, contest outcomes shed light on factors that correlate with successfully building secure software and breaking insecure software. We ran three contests involving a total of 156 teams and three different programming problems. Quantitative analysis from these contests found that the most efficient build-it submissions used C/C++, but submissions coded in a statically-typed language were less likely to have a security flaw. Break-it teams that were also successful build-it teams were significantly better at finding security bugs.
To improve secure development, we created LWeb, a tool for enforcing label-based, information flow policies in database-using web applications. In a nutshell, LWeb marries the LIO Haskell IFC enforcement library with the Yesod web programming framework. The implementation has two parts. First, we extract the core of LIO into a monad transformer (LMonad) and then apply it to Yesodâs core monad. Second, we extend Yesodâs table definition DSL and query functionality to permit defining and enforcing label-based policies on tables and enforcing them during query processing. LWebâs policy language is expressive, permitting dynamic per-table and per-row policies. We formalize the essence of LWeb in the λLWeb calculus and mechanize the proof of noninterference in Liquid Haskell. This mechanization constitutes the first metatheoretic proof carried out in Liquid Haskell. We also used LWeb to build the web site hosting BIBIFI. The site involves 40 data tables and sophisticated policies. Compared to manually checking security policies, LWeb imposes a modest runtime overhead of between 2% to 21%. It reduces the trusted code base from the whole application to just 1% of the application code, and 21% of the code overall (when counting LWeb too).
Finally, we verify the correctness of distributed applications based on conflict-free replicated data types (CRDTs). In order to do so, we add an extension to Liquid Haskell that facilitates stating and semi-automatically proving properties of typeclasses. Our work allows refinement types to be attached to typeclass method declarations, and ensures that instance implementations respect these types. The engineering of this extension is a modular interaction between GHC, the Glasgow Haskell Compiler, and Liquid Haskellâs core proof infrastructure. To verify CRDTs, we define them as a typeclass with refinement types that capture the mathematical properties CRDTs must satisfy, prove that these properties are sufficient to ensure that replicasâ states converge despite out-of-order delivery, implement (and prove correct) several instances of our CRDT typeclass, and use them to build two realistic applications, a multi-user calendar event planner and a collaborative text editor. In addition, we demonstrate the utility of our typeclass extension by using Liquid Haskell to modularly verify that 34 instances satisfy the laws of five standard typeclasses
- âŠ