2,536 research outputs found
On Computational Small Steps and Big Steps: Refocusing for Outermost Reduction
We study the relationship between small-step semantics, big-step semantics and abstract machines, for programming languages that employ an outermost reduction strategy, i.e., languages where reductions near the root of the abstract syntax tree are performed before reductions near the leaves.In particular, we investigate how Biernacka and Danvy's syntactic correspondence and Reynolds's functional correspondence can be applied to inter-derive semantic specifications for such languages.The main contribution of this dissertation is three-fold:First, we identify that backward overlapping reduction rules in the small-step semantics cause the refocusing step of the syntactic correspondence to be inapplicable.Second, we propose two solutions to overcome this in-applicability: backtracking and rule generalization.Third, we show how these solutions affect the other transformations of the two correspondences.Other contributions include the application of the syntactic and functional correspondences to Boolean normalization.In particular, we show how to systematically derive a spectrum of normalization functions for negational and conjunctive normalization
Towards an Achievable Performance for the Loop Nests
Numerous code optimization techniques, including loop nest optimizations,
have been developed over the last four decades. Loop optimization techniques
transform loop nests to improve the performance of the code on a target
architecture, including exposing parallelism. Finding and evaluating an
optimal, semantic-preserving sequence of transformations is a complex problem.
The sequence is guided using heuristics and/or analytical models and there is
no way of knowing how close it gets to optimal performance or if there is any
headroom for improvement. This paper makes two contributions. First, it uses a
comparative analysis of loop optimizations/transformations across multiple
compilers to determine how much headroom may exist for each compiler. And
second, it presents an approach to characterize the loop nests based on their
hardware performance counter values and a Machine Learning approach that
predicts which compiler will generate the fastest code for a loop nest. The
prediction is made for both auto-vectorized, serial compilation and for
auto-parallelization. The results show that the headroom for state-of-the-art
compilers ranges from 1.10x to 1.42x for the serial code and from 1.30x to
1.71x for the auto-parallelized code. These results are based on the Machine
Learning predictions.Comment: Accepted at the 31st International Workshop on Languages and
Compilers for Parallel Computing (LCPC 2018
An Active-Library Based Investigation into the Performance Optimisation of Linear Algebra and the Finite Element Method
In this thesis, I explore an approach called "active libraries". These are libraries that take
part in their own optimisation, enabling both high-performance code and the presentation of
intuitive abstractions.
I investigate the use of active libraries in two domains. Firstly, dense and sparse linear algebra,
particularly, the solution of linear systems of equations. Secondly, the specification and solution
of finite element problems.
Extending my earlier (MEng) thesis work, I describe the modifications to my linear algebra
library "Desola" required to perform sparse-matrix code generation. I show that optimisations
easily applied in the dense case using code-transformation must be applied at a higher level of
abstraction in the sparse case. I present performance results for sparse linear system solvers
generated using Desola and compare against an implementation using the Intel Math Kernel
Library. I also present improved dense linear-algebra performance results.
Next, I explore the active-library approach by developing a finite element library that captures
runtime representations of basis functions, variational forms and sequences of operations between
discretised operators and fields. Using captured representations of variational forms and
basis functions, I demonstrate optimisations to cell-local integral assembly that this approach
enables, and compare against the state of the art.
As part of my work on optimising local assembly, I extend the work of Hosangadi et al. on
common sub-expression elimination and factorisation of polynomials. I improve the weight
function presented by Hosangadi et al., increasing the number of factorisations found. I present
an implementation of an optimised branch-and-bound algorithm inspired by reformulating the
original matrix-covering problem as a maximal graph biclique search problem. I evaluate the
algorithm's effectiveness on the expressions generated by our finite element solver
- …