Search CORE

3,178 research outputs found

A HIERARCHICAL REGISTER OPTIMIZATION APPROACH

Author: A Hamdoun
M Fettach
O Sentieys
Publication venue
Publication date: 01/05/2020
Field of study

Abstract A hierarchical register allocation approach in high-level synthesis is presented. First, we accomplish the trivial register allocation and then we attempt to optimize the number of required registers. In this work, we extend conventional register allocation algorithms to handle behavioral descriptions containing conditional branches and loops. However, in our approach the register optimization will carried out with explicit consideration of interconnection cost. Results show that our approach is more efficient for data flow graphs that contain nested conditional blocks and loops

CiteSeerX

Clustered multithreading for speculative execution

Author: Marukatat Rangsipan
Publication venue: The University of Edinburgh
Publication date: 01/01/2003
Field of study

Edinburgh Research Archive

Link-time smart card code hardening

Author: De Bosschere Koen
De Keulenaer Ronald
De Sutter Bjorn
Maebe Jonas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

This paper presents a feasibility study to protect smart card software against fault-injection attacks by means of link-time code rewriting. This approach avoids the drawbacks of source code hardening, avoids the need for manual assembly writing, and is applicable in conjunction with closed third-party compilers. We implemented a range of cookbook code hardening recipes in a prototype link-time rewriter and evaluate their coverage and associated overhead to conclude that this approach is promising. We demonstrate that the overhead of using an automated link-time approach is not significantly higher than what can be obtained with compile-time hardening or with manual hardening of compiler-generated assembly code

Ghent University Academic Bibliography

Recommended from our members

Behavioral synthesis from VHDL using structured modeling

Author: Gajski Daniel D.
Lis Joseph S.
Publication venue: eScholarship, University of California
Publication date: 01/01/1991
Field of study

This dissertation describes work in behavioral synthesis involving the development of a VHDL Synthesis System VSS which accepts a VHDL behavioral input specification and performs technology independent synthesis to generate a circuit netlist of generic components. The VHDL language is used for input and output descriptions. An intermediate representation which incorporates signal typing and component attributes simplifies compilation and facilitates design optimization.A Structured Modeling methodology has been developed to suggest standard VHDL modeling practices for synthesis. Structured modeling provides recommendations for the use of available VHDL description styles so that optimal designs will be synthesized.A design composed of generic components is synthesized from the input description through a process of Graph Compilation, Graph Criticism, and Design Compilation. Experiments were performed to demonstrate the effects of different modeling styles on the quality of the design produced by VSS. Several alternative VHDL models were examined for each benchmark, illustrating the improvements in design quality achieved when Structured Modeling guidelines were followed

eScholarship - University of California

High level synthesis of memory architectures

Author: Fallside Hamish
Publication venue: The University of Edinburgh
Publication date: 01/01/1995
Field of study

Edinburgh Research Archive

pocl: A Performance-Portable OpenCL Implementation

Author: Berg Heikki
de La Lama Carlos Sánchez
Jääskeläinen Pekka
Raiskila Kalle
Schnetter Erik
Takala Jarmo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

OpenCL is a standard for parallel programming of heterogeneous systems. The benefits of a common programming standard are clear; multiple vendors can provide support for application descriptions written according to the standard, thus reducing the program porting effort. While the standard brings the obvious benefits of platform portability, the performance portability aspects are largely left to the programmer. The situation is made worse due to multiple proprietary vendor implementations with different characteristics, and, thus, required optimization strategies. In this paper, we propose an OpenCL implementation that is both portable and performance portable. At its core is a kernel compiler that can be used to exploit the data parallelism of OpenCL programs on multiple platforms with different parallel hardware styles. The kernel compiler is modularized to perform target-independent parallel region formation separately from the target-specific parallel mapping of the regions to enable support for various styles of fine-grained parallel resources such as subword SIMD extensions, SIMD datapaths and static multi-issue. Unlike previous similar techniques that work on the source level, the parallel region formation retains the information of the data parallelism using the LLVM IR and its metadata infrastructure. This data can be exploited by the later generic compiler passes for efficient parallelization. The proposed open source implementation of OpenCL is also platform portable, enabling OpenCL on a wide range of architectures, both already commercialized and on those that are still under research. The paper describes how the portability of the implementation is achieved. Our results show that most of the benchmarked applications when compiled using pocl were faster or close to as fast as the best proprietary OpenCL implementation for the platform at hand.Comment: This article was published in 2015; it is now openly accessible via arxi

arXiv.org e-Print Archive

Trepo - Institutional Repository of Tampere University

Design of multimedia processor based on metric computation

Author: Balasa
Berekovic
Jean Luc Philippe
Jean Philippe Diguet
Mohamed Abid
Nader Ben Amor
Suzuki
Wuytack
Yannick Le Moullec
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

Media-processing applications, such as signal processing, 2D and 3D graphics rendering, and image compression, are the dominant workloads in many embedded systems today. The real-time constraints of those media applications have taxing demands on today's processor performances with low cost, low power and reduced design delay. To satisfy those challenges, a fast and efficient strategy consists in upgrading a low cost general purpose processor core. This approach is based on the personalization of a general RISC processor core according the target multimedia application requirements. Thus, if the extra cost is justified, the general purpose processor GPP core can be enforced with instruction level coprocessors, coarse grain dedicated hardware, ad hoc memories or new GPP cores. In this way the final design solution is tailored to the application requirements. The proposed approach is based on three main steps: the first one is the analysis of the targeted application using efficient metrics. The second step is the selection of the appropriate architecture template according to the first step results and recommendations. The third step is the architecture generation. This approach is experimented using various image and video algorithms showing its feasibility

arXiv.org e-Print Archive

Crossref

HAL-Université de Bretagne Occidentale

VBN

A formally verified compiler back-end

Author: A Dold
A Dold
A Hobor
A Pnueli
ACJ Fox
AJ Chlipala
AW Appel
AW Appel
AW Appel
BK Rosen
C Lindig
CW Barrett
D Cachera
D Lacey
D Leinenbach
D Leinenbach
E Eide
F Henderson
G Barthe
G Barthe
G Barthe
G Barthe
G Clemmensen
G Goos
G Klein
G Li
G Li
G Morrisett
G Morrisett
GA Kildall
GC Necula
GC Necula
GC Necula
GC Necula
GJ Chaitin
GP Huet
H-J Boehm
IBM Corporation
J Chen
J Guttman
J Knoop
J Knoop
J McCarthy
J-B Tristan
J-B Tristan
JO Blech
JR Ellis
JS Moore
JS Moore
L Beringer
L Chirica
L George
L Rideau
LD Zuck
M Huisman
M Müller-Olm
M Strecker
MA Dave
N Benton
P Letouzey
P Letouzey
PH Hartel
PW O’Hearn
Q Huang
R Milner
R Stärk
S Beyer
S Blazy
S Blazy
S Coupet-Grimal
S Gulwani
S Lerner
SL Peyton Jones
SS Muchnick
TC Hales
WM McKeeman
X Feng
X Leroy
X Leroy
X Leroy
X Leroy
X Rival
Xavier Leroy
Y Bertot
Y Bertot
Z Shao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

This article describes the development and formal verification (proof of semantic preservation) of a compiler back-end from Cminor (a simple imperative intermediate language) to PowerPC assembly code, using the Coq proof assistant both for programming the compiler and for proving its correctness. Such a verified compiler is useful in the context of formal methods applied to the certification of critical software: the verification of the compiler guarantees that the safety properties proved on the source code hold for the executable compiled code as well

arXiv.org e-Print Archive

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Simple and Effective Type Check Removal through Lazy Basic Block Versioning

Author: Chevalier-Boisvert Maxime
Feeley Marc
Publication venue
Publication date: 01/01/2015
Field of study

Dynamically typed programming languages such as JavaScript and Python defer type checking to run time. In order to maximize performance, dynamic language VM implementations must attempt to eliminate redundant dynamic type checks. However, type inference analyses are often costly and involve tradeoffs between compilation time and resulting precision. This has lead to the creation of increasingly complex multi-tiered VM architectures. This paper introduces lazy basic block versioning, a simple JIT compilation technique which effectively removes redundant type checks from critical code paths. This novel approach lazily generates type-specialized versions of basic blocks on-the-fly while propagating context-dependent type information. This does not require the use of costly program analyses, is not restricted by the precision limitations of traditional type analyses and avoids the implementation complexity of speculative optimization techniques. We have implemented intraprocedural lazy basic block versioning in a JavaScript JIT compiler. This approach is compared with a classical flow-based type analysis. Lazy basic block versioning performs as well or better on all benchmarks. On average, 71% of type tests are eliminated, yielding speedups of up to 50%. We also show that our implementation generates more efficient machine code than TraceMonkey, a tracing JIT compiler for JavaScript, on several benchmarks. The combination of implementation simplicity, low algorithmic complexity and good run time performance makes basic block versioning attractive for baseline JIT compilers

arXiv.org e-Print Archive

CiteSeerX

Dagstuhl Research Online Publication Server

GCC-Plugin for Automated Accelerator Generation and Integration on Hybrid FPGA-SoCs

Author: Castrillon Jeronimo
Hempel Gerald
Hochberger Christian
Vogt Markus
Publication venue
Publication date: 01/09/2015
Field of study

In recent years, architectures combining a reconfigurable fabric and a general purpose processor on a single chip became increasingly popular. Such hybrid architectures allow extending embedded software with application specific hardware accelerators to improve performance and/or energy efficiency. Aiding system designers and programmers at handling the complexity of the required process of hardware/software (HW/SW) partitioning is an important issue. Current methods are often restricted, either to bare-metal systems, to subsets of mainstream programming languages, or require special coding guidelines, e.g., via annotations. These restrictions still represent a high entry barrier for the wider community of programmers that new hybrid architectures are intended for. In this paper we revisit HW/SW partitioning and present a seamless programming flow for unrestricted, legacy C code. It consists of a retargetable GCC plugin that automatically identifies code sections for hardware acceleration and generates code accordingly. The proposed workflow was evaluated on the Xilinx Zynq platform using unmodified code from an embedded benchmark suite.Comment: Presented at Second International Workshop on FPGAs for Software Programmers (FSP 2015) (arXiv:1508.06320

arXiv.org e-Print Archive

TUbiblio