2,946 research outputs found
Thread-spawning schemes for speculative multithreading
Speculative multithreading has been recently proposed to boost performance by means of exploiting thread-level parallelism in applications difficult to parallelize. The performance of these processors heavily depends on the partitioning policy used to split the program into threads. Previous work uses heuristics to spawn speculative threads based on easily-detectable program constructs such as loops or subroutines. In this work we propose a profile-based mechanism to divide programs into threads by searching for those parts of the code that have certain features that could benefit from potential thread-level parallelism. Our profile-based spawning scheme is evaluated on a Clustered Speculative Multithreaded Processor and results show large performance benefits. When the proposed spawning scheme is compared with traditional heuristics, we outperform them by almost 20%. When a realistic value predictor and a 8-cycle thread initialization penalty is considered, the performance difference between them is maintained. The speed-up over a single thread execution is higher than 5x for a 16-thread-unit processor and close to 2x for a 4-thread-unit processor.Peer ReviewedPostprint (published version
Probabilistic data flow analysis: a linear equational approach
Speculative optimisation relies on the estimation of the probabilities that
certain properties of the control flow are fulfilled. Concrete or estimated
branch probabilities can be used for searching and constructing advantageous
speculative and bookkeeping transformations.
We present a probabilistic extension of the classical equational approach to
data-flow analysis that can be used to this purpose. More precisely, we show
how the probabilistic information introduced in a control flow graph by branch
prediction can be used to extract a system of linear equations from a program
and present a method for calculating correct (numerical) solutions.Comment: In Proceedings GandALF 2013, arXiv:1307.416
A Survey on Compiler Autotuning using Machine Learning
Since the mid-1990s, researchers have been trying to use machine-learning
based approaches to solve a number of different compiler optimization problems.
These techniques primarily enhance the quality of the obtained results and,
more importantly, make it feasible to tackle two main compiler optimization
problems: optimization selection (choosing which optimizations to apply) and
phase-ordering (choosing the order of applying optimizations). The compiler
optimization space continues to grow due to the advancement of applications,
increasing number of compiler optimizations, and new target architectures.
Generic optimization passes in compilers cannot fully leverage newly introduced
optimizations and, therefore, cannot keep up with the pace of increasing
options. This survey summarizes and classifies the recent advances in using
machine learning for the compiler optimization field, particularly on the two
major problems of (1) selecting the best optimizations and (2) the
phase-ordering of optimizations. The survey highlights the approaches taken so
far, the obtained results, the fine-grain classification among different
approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our
Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated
quarterly here (Send me your new published papers to be added in the
subsequent version) History: Received November 2016; Revised August 2017;
Revised February 2018; Accepted March 2018
Recommended from our members
An efficient global resource constrained technique for exploiting instruction level parallelism
A new Global Resource-constrained Percolation (GRiP) scheduling technique is presented for exploiting instruction level parallelism. Other techniques that have been proposed either have been prohibitively expensive in terms of computation or have limited parallelism. The GRiP technique has been implemented and simulation results are presented
Memory and Parallelism Analysis Using a Platform-Independent Approach
Emerging computing architectures such as near-memory computing (NMC) promise
improved performance for applications by reducing the data movement between CPU
and memory. However, detecting such applications is not a trivial task. In this
ongoing work, we extend the state-of-the-art platform-independent software
analysis tool with NMC related metrics such as memory entropy, spatial
locality, data-level, and basic-block-level parallelism. These metrics help to
identify the applications more suitable for NMC architectures.Comment: 22nd ACM International Workshop on Software and Compilers for
Embedded Systems (SCOPES '19), May 201
Chaotic Compilation for Encrypted Computing: Obfuscation but Not in Name
An `obfuscation' for encrypted computing is quantified exactly here, leading
to an argument that security against polynomial-time attacks has been achieved
for user data via the deliberately `chaotic' compilation required for security
properties in that environment. Encrypted computing is the emerging science and
technology of processors that take encrypted inputs to encrypted outputs via
encrypted intermediate values (at nearly conventional speeds). The aim is to
make user data in general-purpose computing secure against the operator and
operating system as potential adversaries. A stumbling block has always been
that memory addresses are data and good encryption means the encrypted value
varies randomly, and that makes hitting any target in memory problematic
without address decryption, yet decryption anywhere on the memory path would
open up many easily exploitable vulnerabilities. This paper `solves (chaotic)
compilation' for processors without address decryption, covering all of ANSI C
while satisfying the required security properties and opening up the field for
the standard software tool-chain and infrastructure. That produces the argument
referred to above, which may also hold without encryption.Comment: 31 pages. Version update adds "Chaotic" in title and throughout
paper, and recasts abstract and Intro and other sections of the text for
better access by cryptologists. To the same end it introduces the polynomial
time defense argument explicitly in the final section, having now set that
denouement out in the abstract and intr
Energy Efficient Execution of POMDP Policies
Recent advances in planning techniques for partially observable Markov decision processes have focused on online search techniques and offline point-based value iteration. While these techniques allow practitioners to obtain policies for fairly large problems, they assume that a non-negligible amount of computation can be done between each decision point. In contrast, the recent proliferation of mobile and embedded devices has lead to a surge of applications that could benefit from state of the art planning techniques if they can operate under severe constraints on computational resources. To that effect, we describe two techniques to compile policies into controllers that can be executed by a mere table lookup at each decision point. The first approach compiles policies induced by a set of alpha vectors (such as those obtained by point-based techniques) into approximately equivalent controllers, while the second approach performs a simulation to compile arbitrary policies into approximately equivalent controllers. We also describe an approach to compress controllers by removing redundant and dominated nodes, often yielding smaller and yet better controllers. Further compression and higher value can sometimes be obtained by considering stochastic controllers. The compilation and compression techniques are demonstrated on benchmark problems as well as a mobile application to help persons with Alzheimer's to way-find. The battery consumption of several POMDP policies is compared against finite-state controllers learned using methods introduced in this paper. Experiments performed on the Nexus 4 phone show that finite-state controllers are the least battery consuming POMDP policies
Reliable Software for Unreliable Hardware - A Cross-Layer Approach
A novel cross-layer reliability analysis, modeling, and optimization approach is proposed in this thesis that leverages multiple layers in the system design abstraction (i.e. hardware, compiler, system software, and application program) to exploit the available reliability enhancing potential at each system layer and to exchange this information across multiple system layers
- …