Search CORE

2,946 research outputs found

Thread-spawning schemes for speculative multithreading

Author: González Colás Antonio María
Marcuello Pascual Pedro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

Speculative multithreading has been recently proposed to boost performance by means of exploiting thread-level parallelism in applications difficult to parallelize. The performance of these processors heavily depends on the partitioning policy used to split the program into threads. Previous work uses heuristics to spawn speculative threads based on easily-detectable program constructs such as loops or subroutines. In this work we propose a profile-based mechanism to divide programs into threads by searching for those parts of the code that have certain features that could benefit from potential thread-level parallelism. Our profile-based spawning scheme is evaluated on a Clustered Speculative Multithreaded Processor and results show large performance benefits. When the proposed spawning scheme is compared with traditional heuristics, we outperform them by almost 20%. When a realistic value predictor and a 8-cycle thread initialization penalty is considered, the performance difference between them is maintained. The speed-up over a single thread execution is higher than 5x for a 16-thread-unit processor and close to 2x for a 4-thread-unit processor.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Probabilistic data flow analysis: a linear equational approach

Author: Di Pierro Alessandra
Wiklicky Herbert
Publication venue: 'Open Publishing Association'
Publication date: 01/01/2013
Field of study

Speculative optimisation relies on the estimation of the probabilities that certain properties of the control flow are fulfilled. Concrete or estimated branch probabilities can be used for searching and constructing advantageous speculative and bookkeeping transformations. We present a probabilistic extension of the classical equational approach to data-flow analysis that can be used to this purpose. More precisely, we show how the probabilistic information introduced in a control flow graph by branch prediction can be used to extract a system of linear equations from a program and present a method for calculating correct (numerical) solutions.Comment: In Proceedings GandALF 2013, arXiv:1307.416

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Catalogo dei prodotti della ricerca

Spiral - Imperial College Digital Repository

Open Access Repository

A Survey on Compiler Autotuning using Machine Learning

Author: Ashouri Amir H.
Cavazos John
Killian William
Palermo Gianluca
Silvano Cristina
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/09/2018
Field of study

Since the mid-1990s, researchers have been trying to use machine-learning based approaches to solve a number of different compiler optimization problems. These techniques primarily enhance the quality of the obtained results and, more importantly, make it feasible to tackle two main compiler optimization problems: optimization selection (choosing which optimizations to apply) and phase-ordering (choosing the order of applying optimizations). The compiler optimization space continues to grow due to the advancement of applications, increasing number of compiler optimizations, and new target architectures. Generic optimization passes in compilers cannot fully leverage newly introduced optimizations and, therefore, cannot keep up with the pace of increasing options. This survey summarizes and classifies the recent advances in using machine learning for the compiler optimization field, particularly on the two major problems of (1) selecting the best optimizations and (2) the phase-ordering of optimizations. The survey highlights the approaches taken so far, the obtained results, the fine-grain classification among different approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated quarterly here (Send me your new published papers to be added in the subsequent version) History: Received November 2016; Revised August 2017; Revised February 2018; Accepted March 2018

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Recommended from our members

An efficient global resource constrained technique for exploiting instruction level parallelism

Author: Nicolau Alexandru
Novack Steven
Publication venue: eScholarship, University of California
Publication date: 21/01/1992
Field of study

A new Global Resource-constrained Percolation (GRiP) scheduling technique is presented for exploiting instruction level parallelism. Other techniques that have been proposed either have been prohibitively expensive in terms of computation or have limited parallelism. The GRiP technique has been implemented and simulation results are presented

eScholarship - University of California

Memory and Parallelism Analysis Using a Platform-Independent Approach

Author: Awan Ahsan Javed
Corda Stefano
Corporaal Henk
Jordans Roel
Singh Gagandeep
Stuijk Sander
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Emerging computing architectures such as near-memory computing (NMC) promise improved performance for applications by reducing the data movement between CPU and memory. However, detecting such applications is not a trivial task. In this ongoing work, we extend the state-of-the-art platform-independent software analysis tool with NMC related metrics such as memory entropy, spatial locality, data-level, and basic-block-level parallelism. These metrics help to identify the applications more suitable for NMC architectures.Comment: 22nd ACM International Workshop on Software and Compilers for Embedded Systems (SCOPES '19), May 201

arXiv.org e-Print Archive

Crossref

Repository TU/e

Pure OAI Repository

Chaotic Compilation for Encrypted Computing: Obfuscation but Not in Name

Author: Breuer Peter T.
Publication venue
Publication date: 29/04/2019
Field of study

An `obfuscation' for encrypted computing is quantified exactly here, leading to an argument that security against polynomial-time attacks has been achieved for user data via the deliberately `chaotic' compilation required for security properties in that environment. Encrypted computing is the emerging science and technology of processors that take encrypted inputs to encrypted outputs via encrypted intermediate values (at nearly conventional speeds). The aim is to make user data in general-purpose computing secure against the operator and operating system as potential adversaries. A stumbling block has always been that memory addresses are data and good encryption means the encrypted value varies randomly, and that makes hitting any target in memory problematic without address decryption, yet decryption anywhere on the memory path would open up many easily exploitable vulnerabilities. This paper `solves (chaotic) compilation' for processors without address decryption, covering all of ANSI C while satisfying the required security properties and opening up the field for the standard software tool-chain and infrastructure. That produces the argument referred to above, which may also hold without encryption.Comment: 31 pages. Version update adds "Chaotic" in title and throughout paper, and recasts abstract and Intro and other sections of the text for better access by cryptologists. To the same end it introduces the polynomial time defense argument explicitly in the final section, having now set that denouement out in the abstract and intr

arXiv.org e-Print Archive

Cryptology ePrint Archive

Energy Efficient Execution of POMDP Policies

Author: Grześ Marek
Hoey Jesse
Poupart Pascal
Yang Xiao
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/12/2014
Field of study

Recent advances in planning techniques for partially observable Markov decision processes have focused on online search techniques and offline point-based value iteration. While these techniques allow practitioners to obtain policies for fairly large problems, they assume that a non-negligible amount of computation can be done between each decision point. In contrast, the recent proliferation of mobile and embedded devices has lead to a surge of applications that could benefit from state of the art planning techniques if they can operate under severe constraints on computational resources. To that effect, we describe two techniques to compile policies into controllers that can be executed by a mere table lookup at each decision point. The first approach compiles policies induced by a set of alpha vectors (such as those obtained by point-based techniques) into approximately equivalent controllers, while the second approach performs a simulation to compile arbitrary policies into approximately equivalent controllers. We also describe an approach to compress controllers by removing redundant and dominated nodes, often yielding smaller and yet better controllers. Further compression and higher value can sometimes be obtained by considering stochastic controllers. The compilation and compression techniques are demonstrated on benchmark problems as well as a mobile application to help persons with Alzheimer's to way-find. The battery consumption of several POMDP policies is compared against finite-state controllers learned using methods introduced in this paper. Experiments performed on the Nexus 4 phone show that finite-state controllers are the least battery consuming POMDP policies

Crossref

Kent Academic Repository

Reliable Software for Unreliable Hardware - A Cross-Layer Approach

Author: Rehman Semeen
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2015
Field of study

A novel cross-layer reliability analysis, modeling, and optimization approach is proposed in this thesis that leverages multiple layers in the system design abstraction (i.e. hardware, compiler, system software, and application program) to exploit the available reliability enhancing potential at each system layer and to exchange this information across multiple system layers

KITopen