458 research outputs found

    Optimizing JavaScript Engines for Modern-day Workloads

    Get PDF
    In modern times, we have seen tremendous increase in popularity and usage of web-based applications. Applications such as presentation softwareand word processors, which were traditionally considered desktop applications are being ported to the web by compiling them to JavaScript. Since JavaScript is the de facto language of the web, JavaScript engine performance significantly affects the overall web application experience. JavaScript, initially intended solely as a client-side scripting language for web browsers, is now being used to implement server-side web applications (node.js) that traditionally have been written in languages like Java. Web application developers expect "C"-like performance out of their applications. Thus, there is a need to reevaluate the optimization strategies implemented in the modern day engines.Thesis statement: I propose that by using run-time and ahead-of-time profiling and type specialization techniques it is possible to improve the performance of JavaScript engines to cater to the needs of modern-day workloads.In this dissertation, we present an improved synergistic type specialization strategy for optimized JavaScript code execution, implemented on top of a research JavaScript engine called MuscalietJS. Our technique combines type feedback and type inference to reinforce and augment each other in a unique way. We then present a novel deoptimization strategy that enables type specialized code generation on top of typed, stack-based virtual machines like CLR. We also describe a server-side offline profiling technique to collect profile information for web application which helps client JavaScript engines (running in the browser) avoid deoptimizations and improve performance of the applications. Finally, we describe a technique to improve the performance of server-side JavaScript code by making use of intelligent profile caching and two new type stability heuristics

    Expression Acceleration: Seamless Parallelization of Typed High-Level Languages

    Full text link
    Efficient parallelization of algorithms on general-purpose GPUs is today essential in many areas. However, it is a non-trivial task for software engineers to utilize GPUs to improve the performance of high-level programs in general. Although many domain-specific approaches are available for GPU acceleration, it is difficult to accelerate existing high-level programs without rewriting parts of the programs using low-level GPU code. In this paper, we propose a different approach, where expressions are marked for acceleration, and the compiler automatically infers which code needs to be accelerated. We call this approach expression acceleration. We design a compiler pipeline for the approach and show how to handle several challenges, including expression extraction, well-formedness, and compiling using multiple backends. The approach is designed and implemented within a statically-typed functional intermediate language and evaluated using three distinct non-trivial case studies

    On the fly type specialization without type analysis

    Full text link
    Les langages de programmation typés dynamiquement tels que JavaScript et Python repoussent la vérification de typage jusqu’au moment de l’exécution. Afin d’optimiser la performance de ces langages, les implémentations de machines virtuelles pour langages dynamiques doivent tenter d’éliminer les tests de typage dynamiques redondants. Cela se fait habituellement en utilisant une analyse d’inférence de types. Cependant, les analyses de ce genre sont souvent coûteuses et impliquent des compromis entre le temps de compilation et la précision des résultats obtenus. Ceci a conduit à la conception d’architectures de VM de plus en plus complexes. Nous proposons le versionnement paresseux de blocs de base, une technique de compilation à la volée simple qui élimine efficacement les tests de typage dynamiques redondants sur les chemins d’exécution critiques. Cette nouvelle approche génère paresseusement des versions spécialisées des blocs de base tout en propageant de l’information de typage contextualisée. Notre technique ne nécessite pas l’utilisation d’analyses de programme coûteuses, n’est pas contrainte par les limitations de précision des analyses d’inférence de types traditionnelles et évite la complexité des techniques d’optimisation spéculatives. Trois extensions sont apportées au versionnement de blocs de base afin de lui donner des capacités d’optimisation interprocédurale. Une première extension lui donne la possibilité de joindre des informations de typage aux propriétés des objets et aux variables globales. Puis, la spécialisation de points d’entrée lui permet de passer de l’information de typage des fonctions appellantes aux fonctions appellées. Finalement, la spécialisation des continuations d’appels permet de transmettre le type des valeurs de retour des fonctions appellées aux appellants sans coût dynamique. Nous démontrons empiriquement que ces extensions permettent au versionnement de blocs de base d’éliminer plus de tests de typage dynamiques que toute analyse d’inférence de typage statique.Dynamically typed programming languages such as JavaScript and Python defer type checking to run time. In order to maximize performance, dynamic language virtual machine implementations must attempt to eliminate redundant dynamic type checks. This is typically done using type inference analysis. However, type inference analyses are often costly and involve tradeoffs between compilation time and resulting precision. This has lead to the creation of increasingly complex multi-tiered VM architectures. We introduce lazy basic block versioning, a simple just-in-time compilation technique which effectively removes redundant type checks from critical code paths. This novel approach lazily generates type-specialized versions of basic blocks on the fly while propagating context-dependent type information. This does not require the use of costly program analyses, is not restricted by the precision limitations of traditional type analyses and avoids the implementation complexity of speculative optimization techniques. Three extensions are made to the basic block versioning technique in order to give it interprocedural optimization capabilities. Typed object shapes give it the ability to attach type information to object properties and global variables. Entry point specialization allows it to pass type information from callers to callees, and call continuation specialization makes it possible to pass return value type information back to callers without dynamic overhead. We empirically demonstrate that these extensions enable basic block versioning to exceed the capabilities of static whole-program type analyses

    Preemptive type checking in dynamically typed programs

    No full text
    With the rise of languages such as JavaScript, dynamically typed languages have gained a strong foothold in the programming language landscape. These languages are very well suited for rapid prototyping and for use with agile programming methodologies. However, programmers would benefit from the ability to detect type errors in their code early, without imposing unnecessary restrictions on their programs.Here we describe a new type inference system that identifies potential type errors through a flow-sensitive static analysis. This analysis is invoked at a very late stage, after the compilation to bytecode and initialisation of the program. It computes for every expression the variable’s present (from the values that it has last been assigned) and future (with which it is used in the further program execution) types, respectively. Using this information, our mechanism inserts type checks at strategic points in the original program. We prove that these checks, inserted as early as possible, preempt type errors earlier than existing type systems. We further show that these checks do not change the semantics of programs that do not raise type errors.Preemptive type checking can be added to existing languages without the need to modify the existing runtime environment. We show this with an implementation for the Python language and demonstrate its effectiveness on a number of benchmarks

    Opportunistic acceleration of array-centric Python computation in heterogeneous environments

    Get PDF
    Dynamic scripting languages, like Python, are growing in popularity and increasingly used by non-expert programmers. These languages provide high level abstractions such as safe memory management, dynamic type handling and array bounds checking. The reduction in boilerplate code enables the concise expression of computation compared to statically typed and compiled languages. This improves programmer productivity. Increasingly, scripting languages are used by domain experts to write numerically intensive code in a variety of domains (e.g. Economics, Zoology, Archaeology and Physics). These programs are often used not just for prototyping but also in deployment. However, such managed program execution comes with a significant performance penalty arising from the interpreter having to decode and dispatch based on dynamic type checking. Modern computer systems are increasingly equipped with accelerators such as GPUs. However, the massive speedups that can be achieved by GPU accelerators come at the cost of program complexity. Directly programming a GPU requires a deep understanding of the computational model of the underlying hardware architecture. While the complexity of such devices is abstracted by programming languages specialised for heterogeneous devices such as CUDA and OpenCL, these are dialects of the low-level C systems programming language used primarily by expert programmers. This thesis presents the design and implementation of ALPyNA, a loop parallelisation and GPU code generation framework. A novel staged parallelisation approach is used to aggressively parallelise each execution instance of a loop nest. Loop dependence relationships that cannot be inferred statically are deferred for runtime analysis. At runtime, these dependences are augmented with runtime information obtained by introspection and the loop nest is parallelised. Parallel GPU kernels are customised to the runtime dependence graph, JIT compiled and executed. A systematic analysis of the execution speed of loop nests is performed using 12 standard loop intensive benchmarks. The evaluation is performed on two CPU–GPU machines. One is a server grade machine while the other is a typical desktop. ALPyNA’s GPU kernels achieve orders of magnitude speedup over the baseline interpreter execution time (up to 16435x) and large speedups (up to 179.55x) over JIT compiled CPU code. The varied performance of JIT compiled GPU code motivates the need for a sophisticated cost model to select the device providing the best speedups at runtime for varying domain sizes. This thesis describes a novel lightweight analytical cost model to determine the fastest device to execute a loop nest at runtime. The ALPyNA Cost Model (ACM) adapts to runtime dependence analysis and is parameterised on the hardware characteristics of the underlying target CPU or GPU. The cost model also takes into account the relative rate at which the interpreter is able to supply the GPU with computational work. ACM is re-targetable to other accelerator devices and only requires minimal install time profiling

    Subheap-Augmented Garbage Collection

    Get PDF
    Automated memory management avoids the tedium and danger of manual techniques. However, as no programmer input is required, no widely available interface exists to permit principled control over sometimes unacceptable performance costs. This dissertation explores the idea that performance-oriented languages should give programmers greater control over where and when the garbage collector (GC) expends effort. We describe an interface and implementation to expose heap partitioning and collection decisions without compromising type safety. We show that our interface allows the programmer to encode a form of reference counting using Hayes\u27 notion of key objects. Preliminary experimental data suggests that our proposed mechanism can avoid high overheads suffered by tracing collectors in some scenarios, especially with tight heaps. However, for other applications, the costs of applying subheaps---in human effort and runtime overheads---remain daunting

    Micro Virtual Machines: A Solid Foundation for Managed Language Implementation

    Get PDF
    Today new programming languages proliferate, but many of them suffer from poor performance and inscrutable semantics. We assert that the root of many of the performance and semantic problems of today's languages is that language implementation is extremely difficult. This thesis addresses the fundamental challenges of efficiently developing high-level managed languages. Modern high-level languages provide abstractions over execution, memory management and concurrency. It requires enormous intellectual capability and engineering effort to properly manage these concerns. Lacking such resources, developers usually choose naive implementation approaches in the early stages of language design, a strategy which too often has long-term consequences, hindering the future development of the language. Existing language development platforms have failed to provide the right level of abstraction, and forced implementers to reinvent low-level mechanisms in order to obtain performance. My thesis is that the introduction of micro virtual machines will allow the development of higher-quality, high-performance managed languages. The first contribution of this thesis is the design of Mu, with the specification of Mu as the main outcome. Mu is the first micro virtual machine, a robust, performant, and light-weight abstraction over just three concerns: execution, concurrency and garbage collection. Such a foundation attacks three of the most fundamental and challenging issues that face existing language designs and implementations, leaving the language implementers free to focus on the higher levels of their language design. The second contribution is an in-depth analysis of on-stack replacement and its efficient implementation. This low-level mechanism underpins run-time feedback-directed optimisation, which is key to the efficient implementation of dynamic languages. The third contribution is demonstrating the viability of Mu through RPython, a real-world non-trivial language implementation. We also did some preliminary research of GHC as a Mu client. We have created the Mu specification and its reference implementation, both of which are open-source. We show that that Mu's on-stack replacement API can gracefully support dynamic languages such as JavaScript, and it is implementable on concrete hardware. Our RPython client has been able to translate and execute non-trivial RPython programs, and can run the RPySOM interpreter and the core of the PyPy interpreter. With micro virtual machines providing a low-level substrate, language developers now have the option to build their next language on a micro virtual machine. We believe that the quality of programming languages will be improved as a result

    Actionable Program Analyses for Improving Software Performance

    Get PDF
    Nowadays, we have greater expectations of software than ever before. This is followed by constant pressure to run the same program on smaller and cheaper machines. To meet this demand, the application’s performance has become the essential concern in software development. Unfortunately, many applications still suffer from performance issues: coding or design errors that lead to performance degradation. However, finding performance issues is a challenging task: there is limited knowledge on how performance issues are discovered and fixed in practice, and current performance profilers report only where resources are spent, but not where resources are wasted. The goal of this dissertation is to investigate actionable performance analyses that help developers optimize their software by applying relatively simple code changes. To understand causes and fixes of performance issues in real-world software, we first present an empirical study of 98 issues in popular JavaScript projects. The study illustrates the prevalence of simple and recurring optimization patterns that lead to significant performance improvements. Then, to help developers optimize their code, we propose two actionable performance analyses that suggest optimizations based on reordering opportunities and method inlining. In this work, we focus on optimizations with four key properties. First, the optimizations are effective, that is, the changes suggested by the analysis lead to statistically significant performance improvements. Second, the optimizations are exploitable, that is, they are easy to understand and apply. Third, the optimizations are recurring, that is, they are applicable across multiple projects. Fourth, the optimizations are out-of-reach for compilers, that is, compilers can not guarantee that a code transformation preserves the original semantics. To reliably detect optimization opportunities and measure their performance benefits, the code must be executed with sufficient test inputs. The last contribution complements state-of-the-art test generation techniques by proposing a novel automated approach for generating effective tests for higher-order functions. We implement our techniques in practical tools and evaluate their effectiveness on a set of popular software systems. The empirical evaluation demonstrates the potential of actionable analyses in improving software performance through relatively simple optimization opportunities

    Bridging the Gap between Machine and Language using First-Class Building Blocks

    Get PDF
    High-performance virtual machines (VMs) are increasingly reused for programming languages for which they were not initially designed. Unfortunately, VMs are usually tailored to specific languages, offer only a very limited interface to running applications, and are closed to extensions. As a consequence, extensions required to support new languages often entail the construction of custom VMs, thus impacting reuse, compatibility and performance. Short of building a custom VM, the language designer has to choose between the expressiveness and the performance of the language. In this dissertation we argue that the best way to open the VM is to eliminate it. We present Pinocchio, a natively compiled Smalltalk, in which we identify and reify three basic building blocks for object-oriented languages. First we define a protocol for message passing similar to calling conventions, independent of the actual message lookup mechanism. The lookup is provided by a self-supporting runtime library written in Smalltalk and compiled to native code. Since it unifies the meta- and base-level we obtain a metaobject protocol (MOP). Then we decouple the language-level manipulation of state from the machine-level implementation by extending the structural reflective model of the language with object layouts, layout scopes and slots. Finally we reify behavior using AST nodes and first-class interpreters separate from the low-level language implementation. We describe the implementations of all three first-class building blocks. For each of the blocks we provide a series of examples illustrating how they enable typical extensions to the runtime, and we provide benchmarks validating the practicality of the approaches
    • …
    corecore