Search CORE

750 research outputs found

Recommended from our members

Percolation scheduling with resource constraints

Author: Ebciogiu Kemal
Nicolau Alexandru
Publication venue: eScholarship, University of California
Publication date: 01/01/1989
Field of study

This paper presents a new approach to resource-constrained compiler extraction of fine-grain parallelism, targeted towards VLIW supercomputers, and in particular, the IBM VLIW (Very Large Instruction Word) processor. The algorithms described integrate resource limitations into Percolation Scheduling—a global parallelization technique—to deal with resource constraints, without sacrificing the generality and completeness of Percolation Scheduling in the process. This is in sharp contrast with previous approaches which either applied only to conditional-free code, or drastically limited the parallelization process by imposing relatively local heuristic resource constraints early in the scheduling process

eScholarship - University of California

Creating portable and efficient packet processing applications

Author: A Korobeynikov
AV Aho
B Wun
CW Fraser
D Bernstein
EA Lee
EJ Johnson
Fulvio Risso
G Memik
J Carlstrom
J Wagner
JA Fisher
JL Hennessy
L Ciminiera
L George
M Baldi
M Baldi
MK Chen
N Shah
Olivier Morandi
P Briggs
Paolo Veglia
Pierluigi Rolando
R Cytron
R Ennals
R Morris
Silvio Valenti
SS Muchnick
T Lindholm
Z Budimlic
Publication venue: Springer
Publication date: 01/01/2011
Field of study

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Design Space Exploration for Sobel Application using OpenIMPACT( Opensource Retargetable Compilation for VLIW Architecture)

Author: Brost Vincent
Saptono Debyo
Yang Fan
Publication venue
Publication date
Field of study

Retargetable compilation infrastructure bring to growth of application-specific programmable systems which directly supporting the different target architectures and design space exploration (DSE) for the instruction set architecture and microarchitecture of the processor under development. There are three categories in this technology costumized„ semiretargetable and retargetable compiler. In DSE retargetable compilation methodology , permit to determine the optimal combination of hardwired components for example IALU, FALU ,Memory,Branch and programmable elements to get better performance that be measured by cycle count/total execution. DSP TI Processor Model as target architecture implemented, we have simulated for Sobel Application on VLIW architecture for observing optimal hardwired component needed in embedded system. With Optimization facility in compiler , result of simulation at variant model defined on system, giving information of Superblock and Hyperblock types can generate code that be executed processor better than Classical type. Model unroll looping in Optimization improved performance simulation until 50% unless in Classical type

Gunadarma University Repository

pocl: A Performance-Portable OpenCL Implementation

Author: Berg Heikki
de La Lama Carlos Sánchez
Jääskeläinen Pekka
Raiskila Kalle
Schnetter Erik
Takala Jarmo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

OpenCL is a standard for parallel programming of heterogeneous systems. The benefits of a common programming standard are clear; multiple vendors can provide support for application descriptions written according to the standard, thus reducing the program porting effort. While the standard brings the obvious benefits of platform portability, the performance portability aspects are largely left to the programmer. The situation is made worse due to multiple proprietary vendor implementations with different characteristics, and, thus, required optimization strategies. In this paper, we propose an OpenCL implementation that is both portable and performance portable. At its core is a kernel compiler that can be used to exploit the data parallelism of OpenCL programs on multiple platforms with different parallel hardware styles. The kernel compiler is modularized to perform target-independent parallel region formation separately from the target-specific parallel mapping of the regions to enable support for various styles of fine-grained parallel resources such as subword SIMD extensions, SIMD datapaths and static multi-issue. Unlike previous similar techniques that work on the source level, the parallel region formation retains the information of the data parallelism using the LLVM IR and its metadata infrastructure. This data can be exploited by the later generic compiler passes for efficient parallelization. The proposed open source implementation of OpenCL is also platform portable, enabling OpenCL on a wide range of architectures, both already commercialized and on those that are still under research. The paper describes how the portability of the implementation is achieved. Our results show that most of the benchmarked applications when compiled using pocl were faster or close to as fast as the best proprietary OpenCL implementation for the platform at hand.Comment: This article was published in 2015; it is now openly accessible via arxi

arXiv.org e-Print Archive

Trepo - Institutional Repository of Tampere University

Survey on Combinatorial Register Allocation and Instruction Scheduling

Author: Lozano Roberto Castañeda
Schulte Christian
Publication venue
Publication date: 01/01/2018
Field of study

Register allocation (mapping variables to processor registers or memory) and instruction scheduling (reordering instructions to increase instruction-level parallelism) are essential tasks for generating efficient assembly code in a compiler. In the last three decades, combinatorial optimization has emerged as an alternative to traditional, heuristic algorithms for these two tasks. Combinatorial optimization approaches can deliver optimal solutions according to a model, can precisely capture trade-offs between conflicting decisions, and are more flexible at the expense of increased compilation time. This paper provides an exhaustive literature review and a classification of combinatorial optimization approaches to register allocation and instruction scheduling, with a focus on the techniques that are most applied in this context: integer programming, constraint programming, partitioned Boolean quadratic programming, and enumeration. Researchers in compilers and combinatorial optimization can benefit from identifying developments, trends, and challenges in the area; compiler practitioners may discern opportunities and grasp the potential benefit of applying combinatorial optimization

arXiv.org e-Print Archive

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line

IMPLEMENTATION OF RSA KEY GENERATION BASED ON RNS USING VERILOG

Author: M VISHAK
SHANKARAIAH N.
Publication venue: Institute for Project Management Pvt. Ltd
Publication date: 30/07/2020
Field of study

RSA key generation is of great concern for implementation of RSA cryptosystem on embedded system due to its long processing latency. In this paper, a novel architecture is presented to provide high processing speed to RSA key generation for embedded platform with limited processing capacity. In order to exploit more data level parallelism, Residue Number System (RNS) is introduced to accelerate RSA key pair generation, in which these independent elements can be processed simultaneously. A cipher processor based on Transport Triggered Architecture (TTA) is proposed to realize the parallelism at the architecture level. In the meantime, division is avoided in the proposed architecture, which reduces the expense of hardware implementation remarkably. The proposed design is implemented by Verilog HDL and verified in matlab. A rate of 3 pairs per second can be achieved for 1024-bit RSA key generation at the frequency of 100 MHz

Interscience Research Network