Search CORE

22,754 research outputs found

Creating portable and efficient packet processing applications

Author: A Korobeynikov
AV Aho
B Wun
CW Fraser
D Bernstein
EA Lee
EJ Johnson
Fulvio Risso
G Memik
J Carlstrom
J Wagner
JA Fisher
JL Hennessy
L Ciminiera
L George
M Baldi
M Baldi
MK Chen
N Shah
Olivier Morandi
P Briggs
Paolo Veglia
Pierluigi Rolando
R Cytron
R Ennals
R Morris
Silvio Valenti
SS Muchnick
T Lindholm
Z Budimlic
Publication venue: Springer
Publication date: 01/01/2011
Field of study

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Chaotic Compilation for Encrypted Computing: Obfuscation but Not in Name

Author: Breuer Peter T.
Publication venue
Publication date: 29/04/2019
Field of study

An `obfuscation' for encrypted computing is quantified exactly here, leading to an argument that security against polynomial-time attacks has been achieved for user data via the deliberately `chaotic' compilation required for security properties in that environment. Encrypted computing is the emerging science and technology of processors that take encrypted inputs to encrypted outputs via encrypted intermediate values (at nearly conventional speeds). The aim is to make user data in general-purpose computing secure against the operator and operating system as potential adversaries. A stumbling block has always been that memory addresses are data and good encryption means the encrypted value varies randomly, and that makes hitting any target in memory problematic without address decryption, yet decryption anywhere on the memory path would open up many easily exploitable vulnerabilities. This paper `solves (chaotic) compilation' for processors without address decryption, covering all of ANSI C while satisfying the required security properties and opening up the field for the standard software tool-chain and infrastructure. That produces the argument referred to above, which may also hold without encryption.Comment: 31 pages. Version update adds "Chaotic" in title and throughout paper, and recasts abstract and Intro and other sections of the text for better access by cryptologists. To the same end it introduces the polynomial time defense argument explicitly in the final section, having now set that denouement out in the abstract and intr

arXiv.org e-Print Archive

Cryptology ePrint Archive

Coarse-grained reconfigurable array architectures

Author: A Lambrechts
B Bougard
B Bougard
B Mei
B Mei
B Mei
B Sutter De
G Venkataramani
H Park
H Park
J Lee
JMP Cardoso
JW Waerdt van de
K Berkel van
K Bondalapati
K Sankaralingam
KE Coons
LH Lee
M Ahn
M Gebhart
M Schlansker
M Taylor
M Woh
MD Galanis
MH Lee
S Friedman
SA Mahlke
T Oh
Y Kim
Y Kim
Y Kim
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Coarse-Grained Reconﬁgurable Array (CGRA) architectures accelerate the same inner loops that beneﬁt from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efﬁciently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on ﬂexibility, performance, and power-efﬁciency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual ﬁne-tuning of source code

Crossref

Ghent University Academic Bibliography

Distributed data cache designs for clustered VLIW processors

Author: Gibert Codina Enric
González Colás Antonio María
Sánchez Jesús
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Wire delays are a major concern for current and forthcoming processors. One approach to deal with this problem is to divide the processor into semi-independent units referred to as clusters. A cluster usually consists of a local register file and a subset of the functional units, while the L1 data cache typically remains centralized in What we call partially distributed architectures. However, as technology evolves, the relative latency of such a centralized cache will increase, leading to an important impact on performance. In this paper, we propose partitioning the L1 data cache among clusters for clustered VLIW processors. We refer to this kind of design as fully distributed processors. In particular; we propose and evaluate three different configurations: a snoop-based cache coherence scheme, a word-interleaved cache, and flexible LO-buffers managed by the compiler. For each alternative, instruction scheduling techniques targeted to cyclic code are developed. Results for the Mediabench suite'show that the performance of such fully distributed architectures is always better than the performance of a partially distributed one with the same amount of resources. In addition, the key aspects of each fully distributed configuration are explored.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Strategies for protecting intellectual property when using CUDA applications on graphics processing units

Author: Cheng J.
Cook S.
Eilam E.
Huang H.
Ladakis E.
Makan K.
Reynaud D.
Wilt N.
Publication venue: Association for Computing Machinery (ACM)
Publication date: 05/06/2016
Field of study

Recent advances in the massively parallel computational abilities of graphical processing units (GPUs) have increased their use for general purpose computation, as companies look to take advantage of big data processing techniques. This has given rise to the potential for malicious software targeting GPUs, which is of interest to forensic investigators examining the operation of software. The ability to carry out reverse-engineering of software is of great importance within the security and forensics elds, particularly when investigating malicious software or carrying out forensic analysis following a successful security breach. Due to the complexity of the Nvidia CUDA (Compute Uni ed Device Architecture) framework, it is not clear how best to approach the reverse engineering of a piece of CUDA software. We carry out a review of the di erent binary output formats which may be encountered from the CUDA compiler, and their implications on reverse engineering. We then demonstrate the process of carrying out disassembly of an example CUDA application, to establish the various techniques available to forensic investigators carrying out black-box disassembly and reverse engineering of CUDA binaries. We show that the Nvidia compiler, using default settings, leaks useful information. Finally, we demonstrate techniques to better protect intellectual property in CUDA algorithm implementations from reverse engineering

Abertay Research Portal

Crossref

University of Strathclyde Institutional Repository

Optimizing the SICStus Prolog virtual machine instruction set

Author: Nässén Henrik
Publication venue: Swedish Institute of Computer Science
Publication date: 01/01/2001
Field of study

The Swedish Institute of Computer Science (SICS) is the vendor of SICStus Prolog. To decrease execution time and reduce space requirements, variants of SICStus Prolog's virtual instruction set were investigated. Semi-automatic ways of finding candidate sets of instructions to combine or specialize were developed and used. Several virtual machines were implemented and the relationship between improvements by combinations and by specializations were investigated. The benefits of specializations and combinations of instructions to the performance of the emulator is on the average of the order of 10%. The code size reduction is 15%

CiteSeerX

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

On the Implementation of GNU Prolog

Author: Abreu Salvador
Codognet Philippe
Diaz Daniel
Publication venue
Publication date: 15/12/2010
Field of study

GNU Prolog is a general-purpose implementation of the Prolog language, which distinguishes itself from most other systems by being, above all else, a native-code compiler which produces standalone executables which don't rely on any byte-code emulator or meta-interpreter. Other aspects which stand out include the explicit organization of the Prolog system as a multipass compiler, where intermediate representations are materialized, in Unix compiler tradition. GNU Prolog also includes an extensible and high-performance finite domain constraint solver, integrated with the Prolog language but implemented using independent lower-level mechanisms. This article discusses the main issues involved in designing and implementing GNU Prolog: requirements, system organization, performance and portability issues as well as its position with respect to other Prolog system implementations and the ISO standardization initiative.Comment: 30 pages, 3 figures, To appear in Theory and Practice of Logic Programming (TPLP); Keywords: Prolog, logic programming system, GNU, ISO, WAM, native code compilation, Finite Domain constraint

arXiv.org e-Print Archive

Crossref

HAL-Paris1

Repositório Científico da Universidade de Évora