Search CORE

235 research outputs found

Generalized Profile-Guided Iterator Recognition

Author: Franke Björn
Manilov Stan
Vasiladiotis Christos
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/02/2018
Field of study

Crossref

Edinburgh Research Explorer

Towards a Compiler Analysis for Parallel Algorithmic Skeletons

Author: Cole Murray
Franke Björn
Manilov Stanislav
Vasiladiotis Christos
von Koch Tobias Edler
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/02/2018
Field of study

Crossref

Edinburgh Research Explorer

Finding Missed Compiler Optimizations by Differential Testing

Author: Barany Gergö
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/02/2018
Field of study

International audienceRandomized differential testing of compilers has had great success in finding compiler crashes and silent miscompila-tions. In this paper we investigate whether we can use similar techniques to improve the quality of the generated code: Can we compare the code generated by different compilers to find optimizations performed by one but missed by another? We have developed a set of tools for running such tests. We compile C code generated by standard random program generators and use a custom binary analysis tool to compare the output programs. Depending on the optimization of interest, the tool can be configured to compare features such as the number of total instructions, multiply or divide instructions, function calls, stack accesses, and more. A standard test case reduction tool produces minimal examples once an interesting difference has been found. We have used our tools to compare the code generated by GCC, Clang, and CompCert. We have found previously un-reported missing arithmetic optimizations in all three compilers, as well as individual cases of unnecessary register spilling, missed opportunities for register coalescing, dead stores, redundant computations, and missing instruction selection patterns

Crossref

INRIA a CCSD electronic archive server

A polyhedral compilation framework for loops with dynamic data-dependent bounds

Author: Cohen Albert
Kruse Michael
Zhao Jie
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/02/2018
Field of study

International audienceWe study the parallelizing compilation and loop nest optimization of an important class of programs where counted loops have a dynamic data-dependent upper bound. Such loops are amenable to a wider set of transformations than general while loops with inductively defined termination conditions: for example, the substitution of closed forms for induction variables remains applicable, removing the loop-carried data dependences induced by termination conditions. We propose an automatic compilation approach to parallelize and optimize dynamic counted loops. Our approach relies on affine relations only, as implemented in state-of-the-art polyhedral libraries. Revisiting a state-of-the-art framework to parallelize arbitrary while loops, we introduce additional control dependences on data-dependent predicates. Our method goes beyond the state of the art in fully automating the process, specializing the code generation algorithm to the case of dynamic counted loops and avoiding the introduction of spurious loop-carried dependences. We conduct experiments on representative irregular computations, from dynamic programming, computer vision and finite element methods to sparse matrix linear algebra. We validate that the method is applicable to general affine transformations for locality optimization, vectorization and parallelization

Crossref

INRIA a CCSD electronic archive server

TC-CIM:Empowering Tensor Comprehensions for Computing-In-Memory

Author: Chelini Lorenzo
Cohen Albert
Corporaal Henk
Drebes Andi
Grosser Tobias
Vadivel Kanishkan
Vasilache Nicolas
Zinenko Oleksandr
Publication venue
Publication date: 01/01/2020
Field of study

Memristor-based, non-von-Neumann architectures performing tensor operations directly in memory are a promising approach to address the ever-increasing demand for energy-efficient, high-throughput hardware accelerators for Machine Learning (ML) inference. A major challenge for the programmability and exploitation of such Computing-In-Memory (CIM) architectures consists in the efficient mapping of tensor operations from high-level ML frameworks to fixed-function hardware blocks implementing in-memory computations. We demonstrate the programmability of memristor-based accelerators with TC-CIM, a fully-automatic, end-to-end compilation flow from Tensor Comprehensions, a mathematical notation for tensor operations, to fixed-function memristor-based hardware blocks. Operations suitable for acceleration are identified using Loop Tactics, a declarative framework to describe computational patterns in a poly-hedral representation. We evaluate our compilation flow on a system-level simulator based on Gem5, incorporating crossbar arrays of memristive devices. Our results show that TC-CIM reliably recognizes tensor operations commonly used in ML workloads across multiple benchmarks in order to offload these operations to the accelerator

Pure OAI Repository

Kmclib: Automated Inference and Verification of Session Types from OCaml Programs

Author: Imai Keigo
Lange Julien
Neykova Rumyana
Publication venue
Publication date: 24/12/2021
Field of study

Copyright © 2022 The Author(s). Theories and tools based on multiparty session types offer correctness guarantees for concurrent programs that communicate using message-passing. These guarantees usually come at the cost of an intrinsically top-down approach, which requires the communication behaviour of the entire program to be specified as a global type. This paper introduces kmclib: an OCaml library that supports the development of correct message-passing programs without having to write any types. The library utilises the meta-programming facilities of OCaml to automatically infer the session types of concurrent programs and verify their compatibility (k-MC [15]). Well-typed programs, written with kmclib, do not lead to communication errors and cannot get stuck

Royal Holloway - Pure

Brunel University Research Archive

TC-CIM: Empowering Tensor Comprehensions for Computing-In-Memory

Author: Chelini Lorenzo
Cohen Albert
Corporaal Henk
Drebes Andi
Grosser Tobias
Vadivel Kanishkan
Vasilache Nicolas
Zinenko Oleksandr
Publication venue: HAL CCSD
Publication date: 01/01/2020
Field of study

International audienceMemristor-based, non-von-Neumann architectures performing tensor operations directly in memory are a promising approach to address the ever-increasing demand for energy-efficient, high-throughput hardware accelerators for Machine Learning (ML) inference. A major challenge for the programmability and exploitation of such Computing-In-Memory (CIM) architectures consists in the efficient mapping of tensor operations from high-level ML frameworks to fixed-function hardware blocks implementing in-memory computations. We demonstrate the programmability of memristor-based accelerators with TC-CIM, a fully-automatic, end-to-end compilation flow from Tensor Comprehensions, a mathematical notation for tensor operations, to fixed-function memristor-based hardware blocks. Operations suitable for acceleration are identified using Loop Tactics, a declarative framework to describe computational patterns in a poly-hedral representation. We evaluate our compilation flow on a system-level simulator based on Gem5, incorporating crossbar arrays of memristive devices. Our results show that TC-CIM reliably recognizes tensor operations commonly used in ML workloads across multiple benchmarks in order to offload these operations to the accelerator

Pure OAI Repository

INRIA a CCSD electronic archive server

HAL-Rennes 1

Value-dependent session design in a dependently typed language

Author: Brady Edwin Charles
de Muijnck-Hughes Jan
Vanderbauwhede Wim
Publication venue: 'Open Publishing Association'
Publication date: 02/04/2019
Field of study

Session Types offer a typing discipline that allows protocol specifications to be used during type-checking, ensuring that implementations adhere to a given specification. When looking to realise global session types in a dependently typed language care must be taken that values introduced in the description are used by roles that know about the value. We present Sessions, a Resource Dependent Embedded Domain Specific Language (EDSL) for describing global session descriptions in the dependently typed language Idris. As we construct session descriptions the values parameterising the EDSLs’ type keeps track of roles and messages they have encountered. We can use this knowledge to ensure that message values are only used by those who know the value. Sessions supports protocol descriptions that are computable, composable, higher-order, and value-dependent. We demonstrate Sessions expressiveness by describing the TCP Handshake, a multi-modal server providing echo and basic arithmetic operations, and a Higher-Order protocol that supports an authentication interaction step.Publisher PD

arXiv.org e-Print Archive

University of Strathclyde Institutional Repository

Enlighten

University of St. Andrews - Pure

St Andrews Research Repository

Fast Nonblocking Persistence for Concurrent Data Structures

Author: Abdallah Shreif
Cai Wentao
Du Mingzhe
Maksimovski Vladimir
Sanna Rafaello
Scott Michael L.
Wen Haosen
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 35th International Symposium on Distributed Computing (DISC 2021)
Publication date: 01/01/2021
Field of study

We present a fully lock-free variant of our recent Montage system for persistent data structures. The variant, nbMontage, adds persistence to almost any nonblocking concurrent structure without introducing significant overhead or blocking of any kind. Like its predecessor, nbMontage is buffered durably linearizable: it guarantees that the state recovered in the wake of a crash will represent a consistent prefix of pre-crash execution. Unlike its predecessor, nbMontage ensures wait-free progress of the persistence frontier, thereby bounding the number of recent updates that may be lost on a crash, and allowing a thread to force an update of the frontier (i.e., to perform a sync operation) without the risk of blocking. As an extra benefit, the helping mechanism employed by our wait-free sync significantly reduces its latency. Performance results for nonblocking queues, skip lists, trees, and hash tables rival custom data structures in the literature - dramatically faster than achieved with prior general-purpose systems, and generally within 50% of equivalent non-persistent structures placed in DRAM

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server