Search CORE

610 research outputs found

On the Porting and Optimisation of Physics Simulations for Heterogeneous Parallel Processors

Author: Martineau Matt J
Publication venue
Publication date: 25/06/2019
Field of study

Explore Bristol Research

Trends in applied econometrics software development 1985-2008, an analysis of Journal of Applied Econometrics research articles, software reviews, data and code

Author: Ooms M.
Publication venue: Amsterdam
Publication date: 01/01/2008
Field of study

VU Research Portal

Combining behavioural types with security analysis

Author: Bartoletti Massimo
Castellani Ilaria
Deniélou Pierre-Malo
Dezani-Ciancaglini Mariangiola
Ghilezan Silvia
Pantovic Jovanka
Pérez Jorge A.
Thiemann Peter
Toninho Bernardo
Vieira Hugo Torres
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Today's software systems are highly distributed and interconnected, and they increasingly rely on communication to achieve their goals; due to their societal importance, security and trustworthiness are crucial aspects for the correctness of these systems. Behavioural types, which extend data types by describing also the structured behaviour of programs, are a widely studied approach to the enforcement of correctness properties in communicating systems. This paper offers a unified overview of proposals based on behavioural types which are aimed at the analysis of security properties

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

INRIA a CCSD electronic archive server

Archivio istituzionale della ricerca - Università di Cagliari

IMT Institutional Repository

Dissertations of the University of Groningen

From constraint programming to heterogeneous parallelism

Author: Ginsbach Philip Andreas
Publication venue: The University of Edinburgh
Publication date: 25/06/2020
Field of study

The scaling limitations of multi-core processor development have led to a diversification of the processor cores used within individual computers. Heterogeneous computing has become widespread, involving the cooperation of several structurally different processor cores. Central processor (CPU) cores are most frequently complemented with graphics processors (GPUs), which despite their name are suitable for many highly parallel computations besides computer graphics. Furthermore, deep learning accelerators are rapidly gaining relevance. Many applications could profit from heterogeneous computing but are held back by the surrounding software ecosystems. Heterogeneous systems are a challenge for compilers in particular, which usually target only the increasingly marginalised homogeneous CPU cores. Therefore, heterogeneous acceleration is primarily accessible via libraries and domain-specific languages (DSLs), requiring application rewrites and resulting in vendor lock-in. This thesis presents a compiler method for automatically targeting heterogeneous hardware from existing sequential C/C++ source code. A new constraint programming method enables the declarative specification and automatic detection of computational idioms within compiler intermediate representation code. Examples of computational idioms are stencils, reductions, and linear algebra. Computational idioms denote algorithmic structures that commonly occur in performance-critical loops. Consequently, well-designed accelerator DSLs and libraries support computational idioms with their programming models and function interfaces. The detection of computational idioms in their middle end enables compilers to incorporate DSL and library backends for code generation. These backends leverage domain knowledge for the efficient utilisation of heterogeneous hardware. The constraint programming methodology is first derived on an abstract model and then implemented as an extension to LLVM. Two constraint programming languages are designed to target this implementation: the Compiler Analysis Description Language (CAnDL), and the extended Idiom Detection Language (IDL). These languages are evaluated on a range of different compiler problems, culminating in a complete heterogeneous acceleration pipeline integrated with the Clang C/C++ compiler. This pipeline was evaluated on the established benchmark collections NPB and Parboil. The approach was applicable to 10 of the benchmark programs, resulting in significant speedups from 1.26× on “histo” to 275× on “sgemm” when starting from sequential baseline versions. In summary, this thesis shows that the automatic recognition of computational idioms during compilation enables the heterogeneous acceleration of sequential C/C++ programs. Moreover, the declarative specification of computational idioms is derived in novel declarative programming languages, and it is demonstrated that constraint programming on Single Static Assignment intermediate code is a suitable method for their automatic detection

Edinburgh Research Archive

Self-Evaluation Applied Mathematics 2003-2008 University of Twente

Author: Mouthaan A.J.
Vegt J.J.W. van der
Publication venue: Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente
Publication date: 01/01/2009
Field of study

This report contains the self-study for the research assessment of the Department of Applied Mathematics (AM) of the Faculty of Electrical Engineering, Mathematics and Computer Science (EEMCS) at the University of Twente (UT). The report provides the information for the Research Assessment Committee for Applied Mathematics, dealing with mathematical sciences at the three universities of technology in the Netherlands. It describes the state of affairs pertaining to the period 1 January 2003 to 31 December 2008

University of Twente Research Information

A metadata-enhanced framework for high performance visual effects

Author: Cornwall Jay L. T.
Cornwall Jay L. T.
Publication venue: Computing, Imperial College London
Publication date: 01/08/2010
Field of study

This thesis is devoted to reducing the interactive latency of image processing computations in visual effects. Film and television graphic artists depend upon low-latency feedback to receive a visual response to changes in effect parameters. We tackle latency with a domain-specific optimising compiler which leverages high-level program metadata to guide key computational and memory hierarchy optimisations. This metadata encodes static and dynamic information about data dependence and patterns of memory access in the algorithms constituting a visual effect – features that are typically difficult to extract through program analysis – and presents it to the compiler in an explicit form. By using domain-specific information as a substitute for program analysis, our compiler is able to target a set of complex source-level optimisations that a vendor compiler does not attempt, before passing the optimised source to the vendor compiler for lower-level optimisation. Three key metadata-supported optimisations are presented. The first is an adaptation of space and schedule optimisation – based upon well-known compositions of the loop fusion and array contraction transformations – to the dynamic working sets and schedules of a runtimeparameterised visual effect. This adaptation sidesteps the costly solution of runtime code generation by specialising static parameters in an offline process and exploiting dynamic metadata to adapt the schedule and contracted working sets at runtime to user-tunable parameters. The second optimisation comprises a set of transformations to generate SIMD ISA-augmented source code. Our approach differs from autovectorisation by using static metadata to identify parallelism, in place of data dependence analysis, and runtime metadata to tune the data layout to user-tunable parameters for optimal aligned memory access. The third optimisation comprises a related set of transformations to generate code for SIMT architectures, such as GPUs. Static dependence metadata is exploited to guide large-scale parallelisation for tens of thousands of in-flight threads. Optimal use of the alignment-sensitive, explicitly managed memory hierarchy is achieved by identifying inter-thread and intra-core data sharing opportunities in memory access metadata. A detailed performance analysis of these optimisations is presented for two industrially developed visual effects. In our evaluation we demonstrate up to 8.1x speed-ups on Intel and AMD multicore CPUs and up to 6.6x speed-ups on NVIDIA GPUs over our best hand-written implementations of these two effects. Programmability is enhanced by automating the generation of SIMD and SIMT implementations from a single programmer-managed scalar representation

Spiral - Imperial College Digital Repository

Programming language Formian.

Author: Disney Peter Lawrence.
Publication venue
Publication date: 20/06/2018
Field of study

Formex algebra is a powerful tool for the generation of data used in the design and analysis of space structures. However, for the algebra to be of practical use, it is necessary to have a means of employing the concepts on a computer. This is the particular problem which this thesis addresses. The solution proposed here is Formian, an interactive programming language, which being modelled on formex algebra allows complex configurations to be generated from a few concise and yet readily understood statements. Formian is designed to allow problems of data generation to be tackled in a single programming environment. The thesis describes the raison d'etre for the Formian programming language and the steps taken to create the language and to provide a practical and reliable implementation in the form of a computer program. A complete description of the language structure is given. This includes an overview of formex algebra. The use of Formian from a designer's viewpoint is provided by interspersing the description with practical examples

University of Surrey

An Active-Library Based Investigation into the Performance Optimisation of Linear Algebra and the Finite Element Method

Author: Russell Francis Prem
Russell Francis Prem
Publication venue: Computing, Imperial College London
Publication date: 01/07/2011
Field of study

In this thesis, I explore an approach called "active libraries". These are libraries that take part in their own optimisation, enabling both high-performance code and the presentation of intuitive abstractions. I investigate the use of active libraries in two domains. Firstly, dense and sparse linear algebra, particularly, the solution of linear systems of equations. Secondly, the specification and solution of finite element problems. Extending my earlier (MEng) thesis work, I describe the modifications to my linear algebra library "Desola" required to perform sparse-matrix code generation. I show that optimisations easily applied in the dense case using code-transformation must be applied at a higher level of abstraction in the sparse case. I present performance results for sparse linear system solvers generated using Desola and compare against an implementation using the Intel Math Kernel Library. I also present improved dense linear-algebra performance results. Next, I explore the active-library approach by developing a finite element library that captures runtime representations of basis functions, variational forms and sequences of operations between discretised operators and fields. Using captured representations of variational forms and basis functions, I demonstrate optimisations to cell-local integral assembly that this approach enables, and compare against the state of the art. As part of my work on optimising local assembly, I extend the work of Hosangadi et al. on common sub-expression elimination and factorisation of polynomials. I improve the weight function presented by Hosangadi et al., increasing the number of factorisations found. I present an implementation of an optimised branch-and-bound algorithm inspired by reformulating the original matrix-covering problem as a maximal graph biclique search problem. I evaluate the algorithm's effectiveness on the expressions generated by our finite element solver

Spiral - Imperial College Digital Repository

Design and optimisation of scientific programs in a categorical language

Author: Ashby Thomas James
Publication venue: The University of Edinburgh
Publication date: 01/01/2005
Field of study

This thesis presents an investigation into the use of advanced computer languages for scientific computing, an examination of performance issues that arise from using such languages for such a task, and a step toward achieving portable performance from compilers by attacking these problems in a way that compensates for the complexity of and differences between modern computer architectures. The language employed is Aldor, a functional language from computer algebra, and the scientific computing area is a subset of the family of iterative linear equation solvers applied to sparse systems. The linear equation solvers that are considered have much common structure, and this is factored out and represented explicitly in the lan-guage as a framework, by means of categories and domains. The flexibility introduced by decomposing the algorithms and the objects they act on into separate modules has a strong performance impact due to its negative effect on temporal locality. This necessi-tates breaking the barriers between modules to perform cross-component optimisation. In this instance the task reduces to one of collective loop fusion and array contrac

CiteSeerX

Edinburgh Research Archive

Recommended from our members

Time-Dependent Density-Functional Theory in Massively Parallel Computer Architectures: The Octopus Project

Author: Alberdi-Rodriguez Joseba
Andrade Xavier
Arruabarrena Agustin
Aspuru-Guzik Alan
Castro Alberto
Louie Steven G.
Marques Miguel A. L.
Muguerza Javier
Nogueira Fernando
Oliveira Micael J. T.
Rubio Angel
Strubbe David A.
Publication venue: 'IOP Publishing'
Publication date: 14/04/2014
Field of study

Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn–Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn–Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.Chemistry and Chemical Biolog

Harvard University - DASH

Digital.CSIC