Search CORE

240 research outputs found

Janus: Statically-Driven and Profile-Guided Automatic Dynamic Binary Parallelisation

Author: Jones TM
Zhou R
Publication venue: CGO 2019 - Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization
Publication date: 01/01/2019
Field of study

We present Janus, a framework that addresses the challenge of automatic binary parallelisation. Janus uses same-ISA dynamic binary modification to optimise application binaries, controlled by static analysis with judicious use of software speculation and runtime checks that ensure the safety of the optimisations. A static binary analyser first examines a binary executable, to determine the loops that are amenable to parallelisation and the transformations required. These are encoded as a series of rewrite rules, the steps needed to convert a serial loop into parallel form. The Janus dynamic binary modifier reads both the original executable and rewrite rules and carries out the transformations on a per-basic-block level just-in-time before execution. Lifting static analysis out of the runtime enables the global and profile-guided views of the application; ambiguities from static binary analysis can in turn be addressed through a combination of dynamic runtime checks and speculation guard against data dependence violations. It allows us to parallelise even those loops containing dynamically discovered code. We demonstrate Janus by parallelising a range of optimised SPEC CPU 2006 benchmarks, achieving average speedups of 2.1× and 6.0× in the best case.Arm Ltd Engineering and Physical Sciences Research Council (EP/K026399/1), Engineering and Physical Sciences Research Council (EP/P020011/1

Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM

Author: Alaejos Guillermo
Alonso-Jordá Pedro
Castelló Adrián
Igual Francisco D.
Martínez Héctor
Quintana-Ortí Enrique S.
Publication venue
Publication date: 31/10/2023
Field of study

We explore the utilization of the Apache TVM open source framework to automatically generate a family of algorithms that follow the approach taken by popular linear algebra libraries, such as GotoBLAS2, BLIS and OpenBLAS, in order to obtain high-performance blocked formulations of the general matrix multiplication (GEMM). % In addition, we fully automatize the generation process, by also leveraging the Apache TVM framework to derive a complete variety of the processor-specific micro-kernels for GEMM. This is in contrast with the convention in high performance libraries, which hand-encode a single micro-kernel per architecture using Assembly code. % In global, the combination of our TVM-generated blocked algorithms and micro-kernels for GEMM 1)~improves portability, maintainability and, globally, streamlines the software life cycle; 2)~provides high flexibility to easily tailor and optimize the solution to different data types, processor architectures, and matrix operand shapes, yielding performance on a par (or even superior for specific matrix shapes) with that of hand-tuned libraries; and 3)~features a small memory footprint.Comment: 35 pages, 22 figures. Submitted to ACM TOM

arXiv.org e-Print Archive

Acceleration of a Full-scale Industrial CFD Application with OP2

Author: Bertolli C
Betts A
Giles MB
Kelly PHJ
Mudalige GR
Radford D
Reguly IZ
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/06/2015
Field of study

Spiral - Imperial College Digital Repository

Vectorizing unstructured mesh computations for many-core architectures

Author: Dagum
Dutykh
Giles
Giles
Giles
Kim
Lindtjorn
Mudalige
Poole
Publication venue: 'Wiley'
Publication date
Field of study