Search CORE

2,411 research outputs found

Relay: A New IR for Machine Learning Frameworks

Author: Abadi Martin
Chen Tianqi
Krizhevsky Alex
Rotem Nadav
Shankar Asim
Vasilache Nicolas
Wei Richard
Wiltschko Alex
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 25/09/2018
Field of study

Machine learning powers diverse services in industry including search, translation, recommendation systems, and security. The scale and importance of these models require that they be efficient, expressive, and portable across an array of heterogeneous hardware devices. These constraints are often at odds; in order to better accommodate them we propose a new high-level intermediate representation (IR) called Relay. Relay is being designed as a purely-functional, statically-typed language with the goal of balancing efficient compilation, expressiveness, and portability. We discuss the goals of Relay and highlight its important design constraints. Our prototype is part of the open source NNVM compiler framework, which powers Amazon's deep learning framework MxNet

arXiv.org e-Print Archive

Crossref

Link-time smart card code hardening

Author: De Bosschere Koen
De Keulenaer Ronald
De Sutter Bjorn
Maebe Jonas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

This paper presents a feasibility study to protect smart card software against fault-injection attacks by means of link-time code rewriting. This approach avoids the drawbacks of source code hardening, avoids the need for manual assembly writing, and is applicable in conjunction with closed third-party compilers. We implemented a range of cookbook code hardening recipes in a prototype link-time rewriter and evaluate their coverage and associated overhead to conclude that this approach is promising. We demonstrate that the overhead of using an automated link-time approach is not significantly higher than what can be obtained with compile-time hardening or with manual hardening of compiler-generated assembly code

Ghent University Academic Bibliography

Blocked All-Pairs Shortest Paths Algorithm on Intel Xeon Phi KNL Processor: A Case Study

Author: A Nakaya
C Rosales
DE Culler
G Venkataraman
J Reinders
MB Giles
RW Floyd
S Jalali
S Warshall
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/11/2018
Field of study

Manycores are consolidating in HPC community as a way of improving performance while keeping power efficiency. Knights Landing is the recently released second generation of Intel Xeon Phi architecture. While optimizing applications on CPUs, GPUs and first Xeon Phi's has been largely studied in the last years, the new features in Knights Landing processors require the revision of programming and optimization techniques for these devices. In this work, we selected the Floyd-Warshall algorithm as a representative case study of graph and memory-bound applications. Starting from the default serial version, we show how data, thread and compiler level optimizations help the parallel implementation to reach 338 GFLOPS.Comment: Computer Science - CACIC 2017. Springer Communications in Computer and Information Science, vol 79

arXiv.org e-Print Archive

Crossref

A Domain-Specific Language and Editor for Parallel Particle Methods

Author: Castrillon Jeronimo
Karol Sven
Nett Tobias
Sbalzarini Ivo F.
Publication venue
Publication date: 17/09/2017
Field of study

Domain-specific languages (DSLs) are of increasing importance in scientific high-performance computing to reduce development costs, raise the level of abstraction and, thus, ease scientific programming. However, designing and implementing DSLs is not an easy task, as it requires knowledge of the application domain and experience in language engineering and compilers. Consequently, many DSLs follow a weak approach using macros or text generators, which lack many of the features that make a DSL a comfortable for programmers. Some of these features---e.g., syntax highlighting, type inference, error reporting, and code completion---are easily provided by language workbenches, which combine language engineering techniques and tools in a common ecosystem. In this paper, we present the Parallel Particle-Mesh Environment (PPME), a DSL and development environment for numerical simulations based on particle methods and hybrid particle-mesh methods. PPME uses the meta programming system (MPS), a projectional language workbench. PPME is the successor of the Parallel Particle-Mesh Language (PPML), a Fortran-based DSL that used conventional implementation strategies. We analyze and compare both languages and demonstrate how the programmer's experience can be improved using static analyses and projectional editing. Furthermore, we present an explicit domain model for particle abstractions and the first formal type system for particle methods.Comment: Submitted to ACM Transactions on Mathematical Software on Dec. 25, 201

arXiv.org e-Print Archive

MPG.PuRe

SEER: Super-Optimization Explorer for HLS using E-graph Rewriting with MLIR

Author: Barbalho Rafael
Chelini Lorenzo
Cheng Jianyi
Coward Samuel
Drane Theo
Publication venue
Publication date: 15/08/2023
Field of study

High-level synthesis (HLS) is a process that automatically translates a software program in a high-level language into a low-level hardware description. However, the hardware designs produced by HLS tools still suffer from a significant performance gap compared to manual implementations. This is because the input HLS programs must still be written using hardware design principles. Existing techniques either leave the program source unchanged or perform a fixed sequence of source transformation passes, potentially missing opportunities to find the optimal design. We propose a super-optimization approach for HLS that automatically rewrites an arbitrary software program into efficient HLS code that can be used to generate an optimized hardware design. We developed a toolflow named SEER, based on the e-graph data structure, to efficiently explore equivalent implementations of a program at scale. SEER provides an extensible framework, orchestrating existing software compiler passes and hardware synthesis optimizers. Our work is the first attempt to exploit e-graph rewriting for large software compiler frameworks, such as MLIR. Across a set of open-source benchmarks, we show that SEER achieves up to 38x the performance within 1.4x the area of the original program. Via an Intel-provided case study, SEER demonstrates the potential to outperform manually optimized designs produced by hardware experts

arXiv.org e-Print Archive

Achieving High-Performance the Functional Way: A Functional Pearl on Expressing High-Performance Optimizations as Rewrite Strategies

Author: Boyle James M
Chakravarty Manuel M. T.
Chen Tianqi
Collins Alexander
Delahaye David
Hall Mary
Henriksen Troels
Jones Simon Peyton
Lattner Chris
McDonell Trevor L.
Ragan-Kelley Jonathan
Steuwer Michel
Steuwer Michel
Svensson Joel
Visser Eelco
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/08/2020
Field of study

Optimizing programs to run efficiently on modern parallel hardware is hard but crucial for many applications. The predominantly used imperative languages - like C or OpenCL - force the programmer to intertwine the code describing functionality and optimizations. This results in a portability nightmare that is particularly problematic given the accelerating trend towards specialized hardware devices to further increase efficiency. Many emerging DSLs used in performance demanding domains such as deep learning or high-performance image processing attempt to simplify or even fully automate the optimization process. Using a high-level - often functional - language, programmers focus on describing functionality in a declarative way. In some systems such as Halide or TVM, a separate schedule specifies how the program should be optimized. Unfortunately, these schedules are not written in well-defined programming languages. Instead, they are implemented as a set of ad-hoc predefined APIs that the compiler writers have exposed. In this functional pearl, we show how to employ functional programming techniques to solve this challenge with elegance. We present two functional languages that work together - each addressing a separate concern. RISE is a functional language for expressing computations using well known functional data-parallel patterns. ELEVATE is a functional language for describing optimization strategies. A high-level RISE program is transformed into a low-level form using optimization strategies written in ELEVATE . From the rewritten low-level program high-performance parallel code is automatically generated. In contrast to existing high-performance domain-specific systems with scheduling APIs, in our approach programmers are not restricted to a set of built-in operations and optimizations but freely define their own computational patterns in RISE and optimization strategies in ELEVATE in a composable and reusable way. We show how our holistic functional approach achieves competitive performance with the state-of-the-art imperative systems Halide and TVM

Crossref

Edinburgh Research Explorer

Enlighten

Reductie van het geheugengebruik van besturingssysteemkernen Memory Footprint Reduction for Operating System Kernels

Author: Chanet Dominique
Publication venue
Publication date: 01/01/2007
Field of study

In ingebedde systemen is er vaak maar een beperkte hoeveelheid geheugen beschikbaar. Daarom wordt er veel aandacht besteed aan het produceren van compacte programma's voor deze systemen, en zijn er allerhande technieken ontwikkeld die automatisch het geheugengebruik van programma's kunnen verkleinen. Tot nu toe richtten die technieken zich voornamelijk op de toepassingssoftware die op het systeem draait, en werd het besturingssysteem over het hoofd gezien. In dit proefschrift worden een aantal technieken beschreven die het mogelijk maken om op een geautomatiseerde manier het geheugengebruik van een besturingssysteemkern gevoelig te verkleinen. Daarbij wordt in eerste instantie gebruik gemaakt van compactietransformaties tijdens het linken. Als we de hardware en software waaruit het systeem samengesteld is kennen, is het mogelijk om nog verdere reducties te bekomen. Daartoe wordt de kern gespecialiseerd voor een bepaalde hardware-software combinatie. Overbodige functionaliteit wordt opgespoord en uit de kern verwijderd, terwijl de resterende functionaliteit wordt aangepast aan de specifieke gebruikspatronen die uit de hardware en software kunnen afgeleid worden. Als laatste worden technieken voorgesteld die het mogelijk maken om weinig of niet uitgevoerde code (bijvoorbeeld code voor het afhandelen van slechts zeldzaam optredende foutcondities) uit het geheugen te verwijderen. Deze code wordt dan enkel ingeladen op het moment dat ze effectief nodig is. Voor ons testsysteem kunnen we met de gecombineerde technieken het geheugengebruik van een Linux 2.4 kern met meer dan 48% verminderen

Ghent University Academic Bibliography