Search CORE

565 research outputs found

Recommended from our members

Duplo: A framework for OCaml post-link optimisation

Author: Jones TM
Licker N
Publication venue: Proceedings of the ACM on Programming Languages
Publication date: 01/01/2020
Field of study

We present a novel framework, Duplo , for the low-level post-link optimisation of OCaml programs, achieving a speedup of 7% and a reduction of at least 15% of the code size of widely-used OCaml applications. Unlike existing post-link optimisers, which typically operate on target-specific machine code, our framework operates on a Low-Level Intermediate Representation (LLIR) capable of representing both the OCaml programs and any C dependencies they invoke through the foreign-function interface (FFI). LLIR is analysed, transformed and lowered to machine code by our post-link optimiser, LLIR-OPT. Most importantly, LLIR allows the optimiser to cross the OCaml-C language boundary, mitigating the overhead incurred by the FFI and enabling analyses and transformations in a previously unavailable context. The optimised IR is then lowered to amd64 machine code through the existing target-specific code generator of LLVM, modified to handle garbage collection just as effectively as the native OCaml backend. We equip our optimiser with a suite of SSA-based transformations and points-to analyses capable of capturing the semantics and representing the memory models of both languages, along with a cross-language inliner to embed C methods into OCaml callers. We evaluate the gains of our framework, which can be attributed to both our optimiser and the more sophisticated amd64 backend of LLVM, on a wide-range of widely-used OCaml applications, as well as an existing suite of micro- and macro-benchmarks used to track the performance of the OCaml compiler. EPSRC EP/P020011/1, Cambridge Trust

Apollo (Cambridge)

Distilling the Real Cost of Production Garbage Collectors

Author: Blackburn Stephen M.
Bond Michael D.
Cai Zixian
Maas Martin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/05/2022
Field of study

Abridged abstract: despite the long history of garbage collection (GC) and its prevalence in modern programming languages, there is surprisingly little clarity about its true cost. Without understanding their cost, crucial tradeoffs made by garbage collectors (GCs) go unnoticed. This can lead to misguided design constraints and evaluation criteria used by GC researchers and users, hindering the development of high-performance, low-cost GCs. In this paper, we develop a methodology that allows us to empirically estimate the cost of GC for any given set of metrics. By distilling out the explicitly identifiable GC cost, we estimate the intrinsic application execution cost using different GCs. The minimum distilled cost forms a baseline. Subtracting this baseline from the total execution costs, we can then place an empirical lower bound on the absolute costs of different GCs. Using this methodology, we study five production GCs in OpenJDK 17, a high-performance Java runtime. We measure the cost of these collectors, and expose their respective key performance tradeoffs. We find that with a modestly sized heap, production GCs incur substantial overheads across a diverse suite of modern benchmarks, spending at least 7-82% more wall-clock time and 6-92% more CPU cycles relative to the baseline cost. We show that these costs can be masked by concurrency and generous provisioning of memory/compute. In addition, we find that newer low-pause GCs are significantly more expensive than older GCs, and, surprisingly, sometimes deliver worse application latency than stop-the-world GCs. Our findings reaffirm that GC is by no means a solved problem and that a low-cost, low-latency GC remains elusive. We recommend adopting the distillation methodology together with a wider range of cost metrics for future GC evaluations.Comment: Camera-ready versio

arXiv.org e-Print Archive

The Australian National University

Garbage collection and the case for high-level low-level programming

Author: Frampton Daniel
Publication venue
Publication date: 17/09/2018
Field of study

The Australian National University

Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from Concrete Concurrency Models

Author: A. Peymandoust
Alastair R. Beresford
Andreas Gal Albert Noll
Bram Adams
Bratin Saha
Carl Hewitt
Charles Antony Richard Hoare
Charles R. Johns
Chen-Yong Cher
Colin Blundell
David Ungar
David Wentzlaff
Doug Lea
ECMA International
Edward A. Lee
freescale semiconductor
Georg Sorst
Gul Agha
Hans Schippers
Haris Volos
Intel Corporation
James Gosling
Jim Gray
John A. Trono
John S. Danaher
John Zigman
Jos'e M. Piquer
Kevin Casey
Kevin Williams
Larry Seiler
Lukasz Ziarek
M. Anton Ertl
Mark S. Miller
Maurice Herlihy
Michael Haupt
Michael R. Marty
Nir Shavit
Pascal Costanza
Philipp Haller
Rajesh K. Karmani
Robert D. Blumofe
Robert Virding
Simon Gay
Sriram Srinivasan
Stefan Marr
Stefan Marr
Stijn Timbermont
Theo D'Hondt
Thomas Kistler
Tom Van Cutsem
Uwe Kastens
Vijay A. Saraswat
Virendra J. Marathe
Wenzhang Zhu
Wolfgang De Meuter
Xu Wang
Yaoqing Gao
Publication venue: 'Open Publishing Association'
Publication date: 01/02/2010
Field of study

The upcoming many-core architectures require software developers to exploit concurrency to utilize available computational power. Today's high-level language virtual machines (VMs), which are a cornerstone of software development, do not provide sufficient abstraction for concurrency concepts. We analyze concrete and abstract concurrency models and identify the challenges they impose for VMs. To provide sufficient concurrency support in VMs, we propose to integrate concurrency operations into VM instruction sets. Since there will always be VMs optimized for special purposes, our goal is to develop a methodology to design instruction sets with concurrency support. Therefore, we also propose a list of trade-offs that have to be investigated to advise the design of such instruction sets. As a first experiment, we implemented one instruction set extension for shared memory and one for non-shared memory concurrency. From our experimental results, we derived a list of requirements for a full-grown experimental environment for further research

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Kent Academic Repository

VMKit: a Substrate for Virtual Machines

Author: Clément Charles
Folliot Bertil
Geoffray Nicolas
Muller Gilles
Thomas Gaël
Publication venue: HAL CCSD
Publication date: 01/01/2009
Field of study

Developing and optimizing a virtual machine (VM) is a tedious task that requires many years of development. Although VMs share some common principles, such as a Just In Time Compiler or a Garbage Collector, this opportunity for sharing hash not been yet exploited in implementing VMs. This paper describes and evaluates VMKit, a first attempt to build a common substrate that eases the development of high-level VMs. VMKit has been successfully used to build three VMs: a Java Virtual Machine, a Common Language Runtime and a lisp-like language with type inference uvml. Our precise contribution is an extensive study of the lessons learnt in implementing such common infrastructure from a performance and an ease of development standpoint. Our performance study shows that VMKit does not degrade performance on CPU-intensive applications, but still requires engineering efforts to compete with other VMs on memory intensive applications. Our high level VMs are only 20,000 lines of code, it took one of the author a month to develop a Common Language Runtime and implementing new ideas in the VMs was remarkably easy

INRIA a CCSD electronic archive server

Designing an adaptable heterogeneous abstract machine by means of reflection

Author: Borning
Bracha
Bull
Diego Diez
Diehl
Francisco Ortin
Goldberg
Golm
Jones
Joy
Lindholm
Neward
Nori
Ortin
Ortin
Parnas
Roddick
Ungar
Ungar
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Predictability of just in time compilation

Author: Bouakaz Adnan
Publication venue: HAL CCSD
Publication date: 23/06/2010
Field of study

The productivity of embedded software development is limited by the high fragmentation of hardware platforms. To alleviate this problem, virtualization has become an important tool in computer science; and virtual machines are used in a number of subdisciplines ranging from operating systems to processor architecture. The processor virtualization can be used to address the portability problem. While the traditional compilation flow consists of compiling program source code into binary objects that can natively executed on a given processor, processor virtualization splits that flow in two parts: the first part consists of compiling the program source code into processor-independent bytecode representation; the second part provides an execution platform that can run this bytecode in a given processor. The second part is done by a virtual machine interpreting the bytecode or by just-in-time (JIT) compiling the bytecodes of a method at run-time in order to improve the execution performance. Many applications feature real-time system requirements. The success of real-time systems relies upon their capability of producing functionally correct results within dened timing constraints. To validate these constraints, most scheduling algorithms assume that the worstcase execution time (WCET) estimation of each task is already known. The WCET of a task is the longest time it takes when it is considered in isolation. Sophisticated techniques are used in static WCET estimation (e.g. to model caches) to achieve both safe and tight estimation. Our work aims at recombining the two domains, i.e. using the JIT compilation in realtime systems. This is an ambitious goal which requires introducing the deterministic in many non-deterministic features, e.g. bound the compilation time and the overhead caused by the dynamic management of the compiled code cache, etc. Due to the limited time of the internship, this report represents a rst attempt to such combination. To obtain the WCET of a program, we have to add the compilation time to the execution time because the two phases are now mixed. Therefore, one needs to know statically how many times in the worst case a function will be compiled. It may be seemed a simple job, but if we consider a resource constraint as the limited memory size and the advanced techniques used in JIT compilation, things will be nasty. We suppose that a function is compiled at the rst time it is used, and its compiled code is cached in limited size software cache. Our objective is to find an appropriate structure cache and replacement policy which reduce the overhead of compilation in the worst case

HAL-Rennes 1

Resource accounting and reservation in Java Virtual Machine

Author: Inti Gonzalez-Herrera
Johann Bourcier
Olivier Barais
Publication venue
Publication date: 24/04/2020
Field of study

The Java platform The Java programming language was designed to developed small application for embedded devices, but it was a long time ago. Today, Java application are running in many platforms ranging from smartphones to enterprise servers. Modern pervasive middleware is typically implemented using Java because of its safety, flexibility, and mature development environment. However, the Java virtual machine specification has not had a major revised since 1999 Several researches had addressed these important issues. As result of these efforts some Java specification requests (JSR) have emerged. We consider there are seven JSRs relate to monitoring and to resource accounting and reservation: three for the Java Management eXtension API (JSRs 3, 160, 255), two for Metric Instrumentation (JSRs 138, 174) and two for resource-consumption management (JSRs 121, 284). The Java Management extension API only addresses monitoring and management: it does not define specific resource accounting or reservation strategies. JSRs 138 and 174 define monitors for the Java Virtual Machine. They are coarse grained, monitoring the number of running threads, the memory used, the number of garbage collections and so on. They monitor the entire virtual machine, not specific component so, they are useless to most middleware. Based on the Multitasking Virtual Machin

CiteSeerX