Search CORE

933 research outputs found

Building Efficient Query Engines in a High-Level Language

Author: Klonatos Yannis
Koch Christoph
Shaikhha Amir
Publication venue
Publication date: 16/12/2016
Field of study

Abstraction without regret refers to the vision of using high-level programming languages for systems development without experiencing a negative impact on performance. A database system designed according to this vision offers both increased productivity and high performance, instead of sacrificing the former for the latter as is the case with existing, monolithic implementations that are hard to maintain and extend. In this article, we realize this vision in the domain of analytical query processing. We present LegoBase, a query engine written in the high-level language Scala. The key technique to regain efficiency is to apply generative programming: LegoBase performs source-to-source compilation and optimizes the entire query engine by converting the high-level Scala code to specialized, low-level C code. We show how generative programming allows to easily implement a wide spectrum of optimizations, such as introducing data partitioning or switching from a row to a column data layout, which are difficult to achieve with existing low-level query compilers that handle only queries. We demonstrate that sufficiently powerful abstractions are essential for dealing with the complexity of the optimization effort, shielding developers from compiler internals and decoupling individual optimizations from each other. We evaluate our approach with the TPC-H benchmark and show that: (a) With all optimizations enabled, LegoBase significantly outperforms a commercial database and an existing query compiler. (b) Programmers need to provide just a few hundred lines of high-level code for implementing the optimizations, instead of complicated low-level code that is required by existing query compilation approaches. (c) The compilation overhead is low compared to the overall execution time, thus making our approach usable in practice for compiling query engines

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Eliminating stack overflow by abstract interpretation

Author: Regehr John
Reid Alastair
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2005
Field of study

ManuscriptAn important correctness criterion for software running on embedded microcontrollers is stack safety: a guarantee that the call stack does not overflow. Our first contribution is a method for statically guaranteeing stack safety of interrupt-driven embedded software using an approach based on context-sensitive dataflow analysis of object code. We have implemented a prototype stack analysis tool that targets software for Atmel AVR microcontrollers and tested it on embedded applications compiled from up to 30,000 lines of C. We experimentally validate the accuracy of the tool, which runs in under 10 sec on the largest programs that we tested. The second contribution of this paper is the development of two novel ways to reduce stack memory requirements of embedded software

The University of Utah: J. Willard Marriott Digital Library

Recommended from our members

Duplo: A framework for OCaml post-link optimisation

Author: Jones TM
Licker N
Publication venue: Proceedings of the ACM on Programming Languages
Publication date: 01/01/2020
Field of study

We present a novel framework, Duplo , for the low-level post-link optimisation of OCaml programs, achieving a speedup of 7% and a reduction of at least 15% of the code size of widely-used OCaml applications. Unlike existing post-link optimisers, which typically operate on target-specific machine code, our framework operates on a Low-Level Intermediate Representation (LLIR) capable of representing both the OCaml programs and any C dependencies they invoke through the foreign-function interface (FFI). LLIR is analysed, transformed and lowered to machine code by our post-link optimiser, LLIR-OPT. Most importantly, LLIR allows the optimiser to cross the OCaml-C language boundary, mitigating the overhead incurred by the FFI and enabling analyses and transformations in a previously unavailable context. The optimised IR is then lowered to amd64 machine code through the existing target-specific code generator of LLVM, modified to handle garbage collection just as effectively as the native OCaml backend. We equip our optimiser with a suite of SSA-based transformations and points-to analyses capable of capturing the semantics and representing the memory models of both languages, along with a cross-language inliner to embed C methods into OCaml callers. We evaluate the gains of our framework, which can be attributed to both our optimiser and the more sophisticated amd64 backend of LLVM, on a wide-range of widely-used OCaml applications, as well as an existing suite of micro- and macro-benchmarks used to track the performance of the OCaml compiler. EPSRC EP/P020011/1, Cambridge Trust

Apollo (Cambridge)

Automated Verification of Practical Garbage Collectors

Author: Hawblitzel Chris
Petrank Erez
Publication venue: 'Logical Methods in Computer Science e.V.'
Publication date: 01/01/2009
Field of study

Garbage collectors are notoriously hard to verify, due to their low-level interaction with the underlying system and the general difficulty in reasoning about reachability in graphs. Several papers have presented verified collectors, but either the proofs were hand-written or the collectors were too simplistic to use on practical applications. In this work, we present two mechanically verified garbage collectors, both practical enough to use for real-world C# benchmarks. The collectors and their associated allocators consist of x86 assembly language instructions and macro instructions, annotated with preconditions, postconditions, invariants, and assertions. We used the Boogie verification generator and the Z3 automated theorem prover to verify this assembly language code mechanically. We provide measurements comparing the performance of the verified collector with that of the standard Bartok collectors on off-the-shelf C# benchmarks, demonstrating their competitiveness

arXiv.org e-Print Archive

CiteSeerX

Dynamic memory management exploration σε συστήματα πολλών accelerators με χρήση Vivado HLS

Author: Davourli Angeliki
Δαβουρλή Αγγελική
Publication venue
Publication date: 29/06/2016
Field of study

DSpace at NTUA

EbbRT: a framework for building per-application library operating systems

Author: Appavoo Jonathan
Cadden James
Dong Han
Krieger Orran
Schatzberg Dan
Publication venue: Computer Science Department, Boston University
Publication date: 23/02/2016
Field of study

Efficient use of high speed hardware requires operating system components be customized to the application work- load. Our general purpose operating systems are ill-suited for this task. We present EbbRT, a framework for constructing per-application library operating systems for cloud applications. The primary objective of EbbRT is to enable high-performance in a tractable and maintainable fashion. This paper describes the design and implementation of EbbRT, and evaluates its ability to improve the performance of common cloud applications. The evaluation of the EbbRT prototype demonstrates memcached, run within a VM, can outperform memcached run on an unvirtualized Linux. The prototype evaluation also demonstrates an 14% performance improvement of a V8 JavaScript engine benchmark, and a node.js webserver that achieves a 50% reduction in 99th percentile latency compared to it run on Linux

Boston University Institutional Repository (OpenBU)

Recommended from our members

Constant-time cost evaluation for behavioral partitioning

Author: Gajski Daniel D.
Narayan Sanjiv
Vahid Frank
Publication venue: eScholarship, University of California
Publication date: 19/03/1992
Field of study

Given a system behavioral specification, partitioning can be used to distribute among chips the processes, procedures, and storage elements that comprise the specification. We introduce a technique for constant-time recomputation of pin, area, and execution-time estimates for a behavioral partitioning move. The technique permits fast, accurate estimations of a large number of partitionings, thus enabling better results than approaches which attain tractable computation time by using gross estimates or less thorough partitioning algorithms. The key to our technique is the isolation and extraction before partitioning of the basic design attributes needed for estimation, and the updating of this information in constant-time for each move. The estimation models are almost as detailed as those presented in previous estimation approaches not intended for constant-time update. The results we provide indicate the speed and practicality of our estimation approach in conjunction with sophisticated partitioning algorithms

eScholarship - University of California

RL4ReAl: Reinforcement Learning for Register Allocation

Author: Aggarwal Rohit
Cohen Albert
Jain Siddharth
Upadrasta Ramakrishna
VenkataKeerthy S.
Publication venue
Publication date: 05/04/2022
Field of study

We propose a novel solution for the Register Allocation problem, leveraging multi-agent hierarchical Reinforcement Learning. We formalize the constraints that precisely define the problem for a given instruction-set architecture, while ensuring that the generated code preserves semantic correctness. We also develop a gRPC based framework providing a modular and efficient compiler interface for training and inference. Experimental results match or outperform the LLVM register allocators, targeting Intel x86 and ARM AArch64

arXiv.org e-Print Archive

Efficient control flow quantification

Author: Christoph Bockisch
Filman R. E.
Laddad Ramnivas
Matthew Arnold
Michael Haupt
Mira Mezini
Sebastian Kanthak
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref