602 research outputs found

    A Formal Foundation for Trace-Based JIT Compilers

    Get PDF
    Trace-based JIT compilers identify frequently executed program paths at run-time and subsequently record, compile and optimize their execution. In order to improve the performance of the generated machine instructions, JIT compilers heavily rely on dynamic analysis of the code. Existing work treats the components of a JIT compiler as a monolithic whole, tied to particular execution semantics. We propose a formal framework that facilitates the design and implementation of a tracing JIT compiler and its accompanying dynamic analyses by decoupling the tracing, optimization, and interpretation processes. This results in a framework that is more configurable and extensible than existing formal tracing models. We formalize the tracer and interpreter as two abstract state machines that communicate through a minimal, well-defined interface. Developing a tracing JIT compiler becomes possible for arbitrary interpreters that implement this interface. The abstract machines also provide the necessary hooks to plug in custom analyses and optimizations

    An Abstract Interpretation-based Model of Tracing Just-In-Time Compilation

    Get PDF
    Tracing just-in-time compilation is a popular compilation technique for the efficient implementation of dynamic languages, which is commonly used for JavaScript, Python and PHP. We provide a formal model of tracing JIT compilation of programs using abstract interpretation. Hot path detection corresponds to an abstraction of the trace semantics of the program. The optimization phase corresponds to a transform of the original program that preserves its trace semantics up to an observation modeled by some abstraction. We provide a generic framework to express dynamic optimizations and prove them correct. We instantiate it to prove the correctness of dynamic type specialization and constant variable folding. We show that our framework is more general than the model of tracing compilation introduced by Guo and Palsberg [2011] based on operational bisimulations.Comment: To appear in ACM Transactions on Programming Languages and System

    On-stack replacement, distilled

    Get PDF
    On-stack replacement (OSR) is essential technology for adaptive optimization, allowing changes to code actively executing in a managed runtime. The engineering aspects of OSR are well-known among VM architects, with several implementations available to date. However, OSR is yet to be explored as a general means to transfer execution between related program versions, which can pave the road to unprecedented applications that stretch beyond VMs. We aim at filling this gap with a constructive and provably correct OSR framework, allowing a class of general-purpose transformation functions to yield a special-purpose replacement. We describe and evaluate an implementation of our technique in LLVM. As a novel application of OSR, we present a feasibility study on debugging of optimized code, showing how our techniques can be used to fix variables holding incorrect values at breakpoints due to optimizations

    Speculative Staging for Interpreter Optimization

    Full text link
    Interpreters have a bad reputation for having lower performance than just-in-time compilers. We present a new way of building high performance interpreters that is particularly effective for executing dynamically typed programming languages. The key idea is to combine speculative staging of optimized interpreter instructions with a novel technique of incrementally and iteratively concerting them at run-time. This paper introduces the concepts behind deriving optimized instructions from existing interpreter instructions---incrementally peeling off layers of complexity. When compiling the interpreter, these optimized derivatives will be compiled along with the original interpreter instructions. Therefore, our technique is portable by construction since it leverages the existing compiler's backend. At run-time we use instruction substitution from the interpreter's original and expensive instructions to optimized instruction derivatives to speed up execution. Our technique unites high performance with the simplicity and portability of interpreters---we report that our optimization makes the CPython interpreter up to more than four times faster, where our interpreter closes the gap between and sometimes even outperforms PyPy's just-in-time compiler.Comment: 16 pages, 4 figures, 3 tables. Uses CPython 3.2.3 and PyPy 1.

    Micro Virtual Machines: A Solid Foundation for Managed Language Implementation

    Get PDF
    Today new programming languages proliferate, but many of them suffer from poor performance and inscrutable semantics. We assert that the root of many of the performance and semantic problems of today's languages is that language implementation is extremely difficult. This thesis addresses the fundamental challenges of efficiently developing high-level managed languages. Modern high-level languages provide abstractions over execution, memory management and concurrency. It requires enormous intellectual capability and engineering effort to properly manage these concerns. Lacking such resources, developers usually choose naive implementation approaches in the early stages of language design, a strategy which too often has long-term consequences, hindering the future development of the language. Existing language development platforms have failed to provide the right level of abstraction, and forced implementers to reinvent low-level mechanisms in order to obtain performance. My thesis is that the introduction of micro virtual machines will allow the development of higher-quality, high-performance managed languages. The first contribution of this thesis is the design of Mu, with the specification of Mu as the main outcome. Mu is the first micro virtual machine, a robust, performant, and light-weight abstraction over just three concerns: execution, concurrency and garbage collection. Such a foundation attacks three of the most fundamental and challenging issues that face existing language designs and implementations, leaving the language implementers free to focus on the higher levels of their language design. The second contribution is an in-depth analysis of on-stack replacement and its efficient implementation. This low-level mechanism underpins run-time feedback-directed optimisation, which is key to the efficient implementation of dynamic languages. The third contribution is demonstrating the viability of Mu through RPython, a real-world non-trivial language implementation. We also did some preliminary research of GHC as a Mu client. We have created the Mu specification and its reference implementation, both of which are open-source. We show that that Mu's on-stack replacement API can gracefully support dynamic languages such as JavaScript, and it is implementable on concrete hardware. Our RPython client has been able to translate and execute non-trivial RPython programs, and can run the RPySOM interpreter and the core of the PyPy interpreter. With micro virtual machines providing a low-level substrate, language developers now have the option to build their next language on a micro virtual machine. We believe that the quality of programming languages will be improved as a result

    Efficient Dispatch of Multi-object Polymorphic Call Sites in Contextual Role-Oriented Programming Languages

    Get PDF
    Adaptive software becomes more and more important as computing is increasingly context-dependent. Runtime adaptability can be achieved by dynamically selecting and applying context-specific code. Role-oriented programming has been proposed as a paradigm to enable runtime adaptive software by design. Roles change the objects’ behavior at runtime, thus adapting the software to a given context. The cost of adaptivity is however a high runtime overhead stemming from executing compositions of behavior-modifying code. It has been shown that the overhead can be reduced by optimizing dispatch plans at runtime when contexts do not change, but no method exists to reduce the overhead in cases with high context variability. This paper presents a novel approach to implement polymorphic role dispatch, taking advantage of run-time information to effectively guard abstractions and enable reuse even in the presence of variable contexts. The concept of polymorphic inline caches is extended to role invocations. We evaluate the implementation with a benchmark for role-oriented programming languages achieving a geometric mean speedup of 4.0× (3.8× up to 4.5×) with static contexts, and close to no overhead in the case of varying contexts over the current implementation of contextual roles in Object Team

    Reusable semantics for implementation of Python optimizing compilers

    Full text link
    Le langage de programmation Python est aujourd'hui parmi les plus populaires au monde grâce à son accessibilité ainsi que l'existence d'un grand nombre de librairies standards. Paradoxalement, Python est également reconnu pour ses performances médiocres lors de l'exécution de nombreuses tâches. Ainsi, l'écriture d’implémentations efficaces du langage est nécessaire. Elle est toutefois freinée par la sémantique complexe de Python, ainsi que par l’absence de sémantique formelle officielle. Pour régler ce problème, nous présentons une sémantique formelle pour Python axée sur l’implémentation de compilateurs optimisants. Cette sémantique est écrite de manière à pouvoir être intégrée et analysée aisément par des compilateurs déjà existants. Nous introduisons également semPy, un évaluateur partiel de notre sémantique formelle. Celui-ci permet d'identifier et de retirer automatiquement certaines opérations redondantes dans la sémantique de Python. Ce faisant, semPy génère une sémantique naturellement plus performante lorsqu'exécutée. Nous terminons en présentant Zipi, un compilateur optimisant pour le langage Python développé avec l'assistance de semPy. Sur certaines tâches, Zipi offre des performances compétitionnant avec celle de PyPy, un compilateur Python reconnu pour ses bonnes performances. Ces résultats ouvrent la porte à des optimisations basées sur une évaluation partielle générant une implémentation spécialisée pour les cas d'usage fréquent du langage.Python is among the most popular programming language in the world due to its accessibility and extensive standard library. Paradoxically, Python is also known for its poor performance on many tasks. Hence, more efficient implementations of the language are required. The development of such optimized implementations is nevertheless hampered by the complex semantics of Python and the lack of an official formal semantics. We address this issue by presenting a formal semantics for Python focussed on the development of optimizing compilers. This semantics is written as to be easily reusable by existing compilers. We also introduce semPy, a partial evaluator of our formal semantics. This tool allows to automatically target and remove redundant operations from the semantics of Python. As such, semPy generates a semantics which naturally executes more efficiently. Finally, we present Zipi, a Python optimizing compiler developped with the aid of semPy. On some tasks, Zipi displays performance competing with those of PyPy, a Python compiler known for its good performance. These results open the door to optimizations based on a partial evaluation technique which generates specialized implementations for frequent use cases

    MicroWalk: A Framework for Finding Side Channels in Binaries

    Full text link
    Microarchitectural side channels expose unprotected software to information leakage attacks where a software adversary is able to track runtime behavior of a benign process and steal secrets such as cryptographic keys. As suggested by incremental software patches for the RSA algorithm against variants of side-channel attacks within different versions of cryptographic libraries, protecting security-critical algorithms against side channels is an intricate task. Software protections avoid leakages by operating in constant time with a uniform resource usage pattern independent of the processed secret. In this respect, automated testing and verification of software binaries for leakage-free behavior is of importance, particularly when the source code is not available. In this work, we propose a novel technique based on Dynamic Binary Instrumentation and Mutual Information Analysis to efficiently locate and quantify memory based and control-flow based microarchitectural leakages. We develop a software framework named \tool~for side-channel analysis of binaries which can be extended to support new classes of leakage. For the first time, by utilizing \tool, we perform rigorous leakage analysis of two widely-used closed-source cryptographic libraries: \emph{Intel IPP} and \emph{Microsoft CNG}. We analyze 1515 different cryptographic implementations consisting of 112112 million instructions in about 105105 minutes of CPU time. By locating previously unknown leakages in hardened implementations, our results suggest that \tool~can efficiently find microarchitectural leakages in software binaries

    A Flexible Framework for Studying Trace-Based Just-In-Time Compilation

    Get PDF
    Just-in-time compilation has proven an effective, though effort-intensive, choice for realizing performant language runtimes. Recently introduced JIT compilation frameworks advocate applying meta-compilation techniques such as partial evaluation or meta-tracing on simple interpreters to reduce the implementation effort. However, such frameworks are few and far between. Designed and highly optimized for performance, they are difficult to experiment with. We therefore present STRAF, a minimalistic yet flexible Scala framework for studying trace-based JIT compilation. STRAF is sufficiently general to support a diverse set of language interpreters, but also sufficiently extensible to enable experiments with trace recording and optimization. We demonstrate the former by plugging two different interpreters into STRAF. We demonstrate the latter by extending STRAF with e.g., constant folding and type-specialization optimizations, which are commonly found in dedicated trace-based JIT compilers. The evaluation shows that STRAF is suitable for prototyping new techniques and formalisms in the domain of trace-based JIT compilation
    • …
    corecore