Search CORE

14 research outputs found

Optimizing indirect branch prediction accuracy in virtual machine interpreters

Author: David Gregg
M. Anton Ertl
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2003
Field of study

Interpreters designed for efficiency execute a huge number of indirect branches and can spend more than half of the execution time in indirect branch mispredictions. Branch target buffers are the best widely available form of indirect branch prediction; however, their prediction accuracy for existing interpreters is only 2%–50%. In this paper we investigate two methods for improving the prediction accuracy of BTBs for interpreters: replicating virtual machine (VM) instructions and combining sequences of VM instructions into superinstructions. We investigate static (interpreter buildtime) and dynamic (interpreter run-time) variants of these techniques and compare them and several combinations of these techniques. These techniques can eliminate nearly all of the dispatch branch mispredictions, and have other benefits, resulting in speedups by a factor of up to 3.17 over efficient threaded-code interpreters, and speedups by a factor of up to 1.3 over techniques relying on superinstructions alone

CiteSeerX

Crossref

Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from Concrete Concurrency Models

Author: A. Peymandoust
Alastair R. Beresford
Andreas Gal Albert Noll
Bram Adams
Bratin Saha
Carl Hewitt
Charles Antony Richard Hoare
Charles R. Johns
Chen-Yong Cher
Colin Blundell
David Ungar
David Wentzlaff
Doug Lea
ECMA International
Edward A. Lee
freescale semiconductor
Georg Sorst
Gul Agha
Hans Schippers
Haris Volos
Intel Corporation
James Gosling
Jim Gray
John A. Trono
John S. Danaher
John Zigman
Jos'e M. Piquer
Kevin Casey
Kevin Williams
Larry Seiler
Lukasz Ziarek
M. Anton Ertl
Mark S. Miller
Maurice Herlihy
Michael Haupt
Michael R. Marty
Nir Shavit
Pascal Costanza
Philipp Haller
Rajesh K. Karmani
Robert D. Blumofe
Robert Virding
Simon Gay
Sriram Srinivasan
Stefan Marr
Stefan Marr
Stijn Timbermont
Theo D'Hondt
Thomas Kistler
Tom Van Cutsem
Uwe Kastens
Vijay A. Saraswat
Virendra J. Marathe
Wenzhang Zhu
Wolfgang De Meuter
Xu Wang
Yaoqing Gao
Publication venue: 'Open Publishing Association'
Publication date: 01/02/2010
Field of study

The upcoming many-core architectures require software developers to exploit concurrency to utilize available computational power. Today's high-level language virtual machines (VMs), which are a cornerstone of software development, do not provide sufficient abstraction for concurrency concepts. We analyze concrete and abstract concurrency models and identify the challenges they impose for VMs. To provide sufficient concurrency support in VMs, we propose to integrate concurrency operations into VM instruction sets. Since there will always be VMs optimized for special purposes, our goal is to develop a methodology to design instruction sets with concurrency support. Therefore, we also propose a list of trade-offs that have to be investigated to advise the design of such instruction sets. As a first experiment, we implemented one instruction set extension for shared memory and one for non-shared memory concurrency. From our experimental results, we derived a list of requirements for a full-grown experimental environment for further research

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Kent Academic Repository

Repositioning Tiered HotSpot Execution Performance Relative to the Interpreter

Author: Casey Kevin
Lambert Jonathan
Monahan Rosemary
Publication venue
Publication date: 13/04/2023
Field of study

Although the advantages of just-in-time compilation over traditional interpretive execution are widely recognised, there needs to be more current research investigating and repositioning the performance differences between these two execution models relative to contemporary workloads. Specifically, there is a need to examine the performance differences between Java Runtime Environment (JRE) Java Virtual Machine (JVM) tiered execution and JRE JVM interpretive execution relative to modern multicore architectures and modern concurrent and parallel benchmark workloads. This article aims to fill this research gap by presenting the results of a study that compares the performance of these two execution models under load from the Renaissance Benchmark Suite. This research is relevant to anyone interested in understanding the performance differences between just-in-time compiled code and interpretive execution. It provides a contemporary assessment of the interpretive JVM core, the entry and starting point for bytecode execution, relative to just-in-time tiered execution. The study considers factors such as the JRE version, the GNU GCC version used in the JRE build toolchain, and the garbage collector algorithm specified at runtime, and their impact on the performance difference envelope between interpretive and tiered execution. Our findings indicate that tiered execution is considerably more efficient than interpretive execution, and the performance gap has increased, ranging from 4 to 37 times more efficient. On average, tiered execution is approximately 15 times more efficient than interpretive execution. Additionally, the performance differences between interpretive and tiered execution are influenced by workload category, with narrower performance differences observed for web-based workloads and more significant differences for Functional and Scala-type workloads.Comment: 17 page

arXiv.org e-Print Archive

The Effect of Instruction Padding on SFI Overhead

Author: Emamdoost Navid
McCamant Stephen
Publication venue: 'Internet Society'
Publication date: 01/01/2018
Field of study

Software-based fault isolation (SFI) is a technique to isolate a potentially faulty or malicious software module from the rest of a system using instruction-level rewriting. SFI implementations on CISC architectures, including Google Native Client, use instruction padding to enforce an address layout invariant and restrict control flow. However this padding decreases code density and imposes runtime overhead. We analyze this overhead, and show that it can be reduced by allowing some execution of overlapping instructions, as long as those overlapping instructions are still safe according to the original per-instruction policy. We implemented this change for both 32-bit and 64-bit x86 versions of Native Client, and analyzed why the performance benefit is higher on 32-bit. The optimization leads to a consistent decrease in the number of instructions executed and savings averaging 8.6% in execution time (over compatible benchmarks from SPECint2006) for x86-32. We describe how to modify the validation algorithm to check the more permissive policy, and extend a machine-checked Coq proof to confirm that the system's security is preserved.Comment: NDSS Workshop on Binary Analysis Research, February 201

arXiv.org e-Print Archive

Crossref

Optimizing Indirect Branch Prediction Accuracy in Virtual Machine Interpreters

Author: Casey Kevin
Ertl Anton
Gregg David
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2007
Field of study

Interpreters designed for efficiency execute a huge number of indirect branches and can spend more than half of the execution time in indirect branch mispredictions. Branch target buffers (BTBs) are the most widely available form of indirect branch prediction; however, their prediction accuracy for existing interpreters is only 2%–50%. In this paper we investigate two methods for improving the prediction accuracy of BTBs for interpreters: replicating virtual machine (VM) instructions and combining sequences of VM instructions into superinstructions. We investigate static (interpreter build-time) and dynamic (interpreter run-time) variants of these techniques and compare them and several combinations of these techniques. To show their generality, we have implemented these optimizations in VMs for both Java and Forth. These techniques can eliminate nearly all of the dispatch branch mispredictions, and have other benefits, resulting in speedups by a factor of up to 4.55 over efficient threaded-code interpreters, and speedups by a factor of up to 1.34 over techniques relying on dynamic superinstructions alone

MURAL - Maynooth University Research Archive Library

NUI Maynooth Eprint Archive

Maynooth University ePrints and eTheses Archive