610 research outputs found
Combining behavioural types with security analysis
Today's software systems are highly distributed and interconnected, and they
increasingly rely on communication to achieve their goals; due to their
societal importance, security and trustworthiness are crucial aspects for the
correctness of these systems. Behavioural types, which extend data types by
describing also the structured behaviour of programs, are a widely studied
approach to the enforcement of correctness properties in communicating systems.
This paper offers a unified overview of proposals based on behavioural types
which are aimed at the analysis of security properties
From constraint programming to heterogeneous parallelism
The scaling limitations of multi-core processor development have led to a diversification of the processor cores used within individual computers. Heterogeneous computing has become widespread, involving the cooperation of several structurally different processor cores. Central processor (CPU) cores are most frequently complemented with graphics processors (GPUs), which despite their name are suitable for many highly parallel computations besides computer graphics. Furthermore, deep learning accelerators are rapidly gaining relevance.
Many applications could profit from heterogeneous computing but are held back by the surrounding software ecosystems. Heterogeneous systems are a challenge for compilers in particular, which usually target only the increasingly marginalised homogeneous CPU cores. Therefore, heterogeneous acceleration is primarily accessible via libraries and domain-specific languages (DSLs), requiring application rewrites and resulting in vendor lock-in.
This thesis presents a compiler method for automatically targeting heterogeneous hardware from existing sequential C/C++ source code. A new constraint programming method enables the declarative specification and automatic detection of computational idioms within compiler intermediate representation code. Examples of computational idioms are stencils, reductions, and linear algebra. Computational idioms denote algorithmic structures that commonly occur in performance-critical loops. Consequently, well-designed accelerator DSLs and libraries support computational idioms with their programming models and function interfaces. The detection of computational idioms in their middle end enables compilers to incorporate DSL and library backends for code generation. These backends leverage domain knowledge for the efficient utilisation of heterogeneous hardware.
The constraint programming methodology is first derived on an abstract model and then implemented as an extension to LLVM. Two constraint programming languages are designed to target this implementation: the Compiler Analysis Description Language (CAnDL), and the extended Idiom Detection Language (IDL). These languages are evaluated on a range of different compiler problems, culminating in a complete heterogeneous acceleration pipeline integrated with the Clang C/C++ compiler. This pipeline was evaluated on the established benchmark collections NPB and Parboil. The approach was applicable to 10 of the benchmark programs, resulting in significant speedups from 1.26× on “histo” to 275× on “sgemm” when starting from sequential baseline versions.
In summary, this thesis shows that the automatic recognition of computational idioms during compilation enables the heterogeneous acceleration of sequential C/C++ programs. Moreover, the declarative specification of computational idioms is derived in novel declarative programming languages, and it is demonstrated that constraint programming on Single Static Assignment intermediate code is a suitable method for their automatic detection
Self-Evaluation Applied Mathematics 2003-2008 University of Twente
This report contains the self-study for the research assessment of the Department of Applied Mathematics (AM) of the Faculty of Electrical Engineering, Mathematics and Computer Science (EEMCS) at the University of Twente (UT). The report provides the information for the Research Assessment Committee for Applied Mathematics, dealing with mathematical sciences at the three universities of technology in the Netherlands. It describes the state of affairs pertaining to the period 1 January 2003 to 31 December 2008
A metadata-enhanced framework for high performance visual effects
This thesis is devoted to reducing the interactive latency of image processing computations in
visual effects. Film and television graphic artists depend upon low-latency feedback to receive
a visual response to changes in effect parameters. We tackle latency with a domain-specific optimising
compiler which leverages high-level program metadata to guide key computational and
memory hierarchy optimisations. This metadata encodes static and dynamic information about
data dependence and patterns of memory access in the algorithms constituting a visual effect –
features that are typically difficult to extract through program analysis – and presents it to the
compiler in an explicit form. By using domain-specific information as a substitute for program
analysis, our compiler is able to target a set of complex source-level optimisations that a vendor
compiler does not attempt, before passing the optimised source to the vendor compiler for
lower-level optimisation.
Three key metadata-supported optimisations are presented. The first is an adaptation of
space and schedule optimisation – based upon well-known compositions of the loop fusion and
array contraction transformations – to the dynamic working sets and schedules of a runtimeparameterised
visual effect. This adaptation sidesteps the costly solution of runtime code generation
by specialising static parameters in an offline process and exploiting dynamic metadata to
adapt the schedule and contracted working sets at runtime to user-tunable parameters. The second
optimisation comprises a set of transformations to generate SIMD ISA-augmented source code.
Our approach differs from autovectorisation by using static metadata to identify parallelism, in
place of data dependence analysis, and runtime metadata to tune the data layout to user-tunable
parameters for optimal aligned memory access. The third optimisation comprises a related set
of transformations to generate code for SIMT architectures, such as GPUs. Static dependence
metadata is exploited to guide large-scale parallelisation for tens of thousands of in-flight threads.
Optimal use of the alignment-sensitive, explicitly managed memory hierarchy is achieved by identifying
inter-thread and intra-core data sharing opportunities in memory access metadata.
A detailed performance analysis of these optimisations is presented for two industrially developed
visual effects. In our evaluation we demonstrate up to 8.1x speed-ups on Intel and AMD
multicore CPUs and up to 6.6x speed-ups on NVIDIA GPUs over our best hand-written implementations
of these two effects. Programmability is enhanced by automating the generation of
SIMD and SIMT implementations from a single programmer-managed scalar representation
Programming language Formian.
Formex algebra is a powerful tool for the generation of data used in the design and analysis of space structures. However, for the algebra to be of practical use, it is necessary to have a means of employing the concepts on a computer. This is the particular problem which this thesis addresses. The solution proposed here is Formian, an interactive programming language, which being modelled on formex algebra allows complex configurations to be generated from a few concise and yet readily understood statements. Formian is designed to allow problems of data generation to be tackled in a single programming environment. The thesis describes the raison d'etre for the Formian programming language and the steps taken to create the language and to provide a practical and reliable implementation in the form of a computer program. A complete description of the language structure is given. This includes an overview of formex algebra. The use of Formian from a designer's viewpoint is provided by interspersing the description with practical examples
An Active-Library Based Investigation into the Performance Optimisation of Linear Algebra and the Finite Element Method
In this thesis, I explore an approach called "active libraries". These are libraries that take
part in their own optimisation, enabling both high-performance code and the presentation of
intuitive abstractions.
I investigate the use of active libraries in two domains. Firstly, dense and sparse linear algebra,
particularly, the solution of linear systems of equations. Secondly, the specification and solution
of finite element problems.
Extending my earlier (MEng) thesis work, I describe the modifications to my linear algebra
library "Desola" required to perform sparse-matrix code generation. I show that optimisations
easily applied in the dense case using code-transformation must be applied at a higher level of
abstraction in the sparse case. I present performance results for sparse linear system solvers
generated using Desola and compare against an implementation using the Intel Math Kernel
Library. I also present improved dense linear-algebra performance results.
Next, I explore the active-library approach by developing a finite element library that captures
runtime representations of basis functions, variational forms and sequences of operations between
discretised operators and fields. Using captured representations of variational forms and
basis functions, I demonstrate optimisations to cell-local integral assembly that this approach
enables, and compare against the state of the art.
As part of my work on optimising local assembly, I extend the work of Hosangadi et al. on
common sub-expression elimination and factorisation of polynomials. I improve the weight
function presented by Hosangadi et al., increasing the number of factorisations found. I present
an implementation of an optimised branch-and-bound algorithm inspired by reformulating the
original matrix-covering problem as a maximal graph biclique search problem. I evaluate the
algorithm's effectiveness on the expressions generated by our finite element solver
Design and optimisation of scientific programs in a categorical language
This thesis presents an investigation into the use of advanced computer languages for scientific computing, an examination of performance issues that arise from using such languages for such a task, and a step toward achieving portable performance from compilers by attacking these problems in a way that compensates for the complexity of and differences between modern computer architectures. The language employed is Aldor, a functional language from computer algebra, and the scientific computing area is a subset of the family of iterative linear equation solvers applied to sparse systems. The linear equation solvers that are considered have much common structure, and this is factored out and represented explicitly in the lan-guage as a framework, by means of categories and domains. The flexibility introduced by decomposing the algorithms and the objects they act on into separate modules has a strong performance impact due to its negative effect on temporal locality. This necessi-tates breaking the barriers between modules to perform cross-component optimisation. In this instance the task reduces to one of collective loop fusion and array contrac
Recommended from our members
Time-Dependent Density-Functional Theory in Massively Parallel Computer Architectures: The Octopus Project
Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn–Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn–Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.Chemistry and Chemical Biolog
- …