1,136 research outputs found

    Micro Virtual Machines: A Solid Foundation for Managed Language Implementation

    Get PDF
    Today new programming languages proliferate, but many of them suffer from poor performance and inscrutable semantics. We assert that the root of many of the performance and semantic problems of today's languages is that language implementation is extremely difficult. This thesis addresses the fundamental challenges of efficiently developing high-level managed languages. Modern high-level languages provide abstractions over execution, memory management and concurrency. It requires enormous intellectual capability and engineering effort to properly manage these concerns. Lacking such resources, developers usually choose naive implementation approaches in the early stages of language design, a strategy which too often has long-term consequences, hindering the future development of the language. Existing language development platforms have failed to provide the right level of abstraction, and forced implementers to reinvent low-level mechanisms in order to obtain performance. My thesis is that the introduction of micro virtual machines will allow the development of higher-quality, high-performance managed languages. The first contribution of this thesis is the design of Mu, with the specification of Mu as the main outcome. Mu is the first micro virtual machine, a robust, performant, and light-weight abstraction over just three concerns: execution, concurrency and garbage collection. Such a foundation attacks three of the most fundamental and challenging issues that face existing language designs and implementations, leaving the language implementers free to focus on the higher levels of their language design. The second contribution is an in-depth analysis of on-stack replacement and its efficient implementation. This low-level mechanism underpins run-time feedback-directed optimisation, which is key to the efficient implementation of dynamic languages. The third contribution is demonstrating the viability of Mu through RPython, a real-world non-trivial language implementation. We also did some preliminary research of GHC as a Mu client. We have created the Mu specification and its reference implementation, both of which are open-source. We show that that Mu's on-stack replacement API can gracefully support dynamic languages such as JavaScript, and it is implementable on concrete hardware. Our RPython client has been able to translate and execute non-trivial RPython programs, and can run the RPySOM interpreter and the core of the PyPy interpreter. With micro virtual machines providing a low-level substrate, language developers now have the option to build their next language on a micro virtual machine. We believe that the quality of programming languages will be improved as a result

    A selective dynamic compiler for embedded Java virtual machine targeting ARM processors

    Get PDF
    Tableau d’honneur de la FacultĂ© des Ă©tudes supĂ©rieures et postdoctorales, 2004-2005Ce travail prĂ©sente une nouvelle technique de compilation dynamique sĂ©lective pour les systĂšmes embarquĂ©s avec processeurs ARM. Ce compilateur a Ă©tĂ© intĂ©grĂ© dans la plateforme J2ME/CLDC (Java 2 Micro Edition for Connected Limited Device Con- figuration). L’objectif principal de notre travail est d’obtenir une machine virtuelle accĂ©lĂ©rĂ©e, lĂ©gĂšre et compacte prĂȘte pour l’exĂ©cution sur les systĂšmes embarquĂ©s. Cela est atteint par l’implĂ©mentation d’un compilateur dynamique sĂ©lectif pour l’architecture ARM dans la Kilo machine virtuelle de Sun (KVM). Ce compilateur est appelĂ© Armed E-Bunny. PremiĂšrement, on prĂ©sente la plateforme Java, le Java 2 Micro Edition(J2ME) pour les systĂšmes embarquĂ©s et les composants de la machine virtuelle Java. Ensuite, on discute les diffĂ©rentes techniques d’accĂ©lĂ©ration pour la machine virtuelle Java et on dĂ©taille le principe de la compilation dynamique. Enfin, on illustre l’architecture, le design (la conception), l’implĂ©mentation et les rĂ©sultats expĂ©rimentaux de notre compilateur dynamique sĂ©lective Armed E-Bunny. La version modifiĂ©e de KVM a Ă©tĂ© portĂ©e sur un ordinateur de poche (PDA) et a Ă©tĂ© testĂ©e en utilisant un benchmark standard de J2ME. Les rĂ©sultats expĂ©rimentaux de la performance montrent une accĂ©lĂ©ration de 360 % par rapport Ă  la derniĂšre version de la KVM de Sun avec un espace mĂ©moire additionnel qui n’excĂšde pas 119 kilobytes.This work presents a new selective dynamic compilation technique targeting ARM 16/32-bit embedded system processors. This compiler is built inside the J2ME/CLDC (Java 2 Micro Edition for Connected Limited Device Configuration) platform. The primary objective of our work is to come up with an efficient, lightweight and low-footprint accelerated Java virtual machine ready to be executed on embedded machines. This is achieved by implementing a selective ARM dynamic compiler called Armed E-Bunny into Sun’s Kilobyte Virtual Machine (KVM). We first present the Java platform, Java 2 Micro Edition (J2ME) for embedded systems and Java virtual machine components. Then, we discuss the different acceleration techniques for Java virtual machine and we detail the principle of dynamic compilation. After that we illustrate the architecture, design, implementation and experimental results of our selective dynamic compiler Armed E-Bunny. The modified KVM is ported on a handheld PDA and is tested using standard J2ME benchmarks. The experimental results on its performance demonstrate that a speedup of 360% over the last version of Sun’s KVM is accomplished with a footprint overhead that does not exceed 119 kilobytes

    Tracing vs. Partial Evaluation: Comparing Meta-Compilation Approaches for Self-Optimizing Interpreters

    Get PDF
    Tracing and partial evaluation have been proposed as meta-compilation techniques for interpreters to make just-in-time compilation language-independent. They promise that programs executing on simple interpreters can reach performance of the same order of magnitude as if they would be executed on state-of-the-art virtual machines with highly optimizing just-in-time compilers built for a specific language. Tracing and partial evaluation approach this meta-compilation from two ends of a spectrum, resulting in different sets of tradeoffs. This study investigates both approaches in the context of self-optimizing interpreters, a technique for building fast abstract-syntax-tree interpreters. Based on RPython for tracing and Truffle for partial evaluation, we assess the two approaches by comparing the impact of various optimizations on the performance of an interpreter for SOM, an object-oriented dynamically-typed language. The goal is to determine whether either approach yields clear performance or engineering benefits. We find that tracing and partial evaluation both reach roughly the same level of performance. SOM based on meta-tracing is on average 3x slower than Java, while SOM based on partial evaluation is on average 2.3x slower than Java. With respect to the engineering, tracing has however significant benefits, because it requires language implementers to apply fewer optimizations to reach the same level of performance

    Applications of information sharing for code generation in process virtual machines

    Get PDF
    As the backbone of many computing environments today, it is important that process virtual machines be both performant and robust in mobile, personal desktop, and enterprise applications. This thesis focusses on code generation within these virtual machines, particularly addressing situations where redundant work is being performed. The goal is to exploit information sharing in order to improve the performance and robustness of virtual machines that are accelerated by native code generation. First, the thesis investigates the potential to share generated code between multiple threads in a dynamic binary translator used to perform instruction set simulation. This is done through a code generation design that allows native code to be executed by any simulated core and adding a mechanism to share native code regions between threads. This is shown to improve the average performance of multi-threaded benchmarks by 1.4x when simulating 128 cores on a quad-core host machine. Secondly, the ahead-of-time code generation system used for executing Android applications is improved through the use of profiling. The thesis investigates the potential for profiles produced by individual users of applications to be shared and merged together to produce a generic profile that still provides a lot of benefit for a new user who is then able to skip the expensive profiling phase. These profiles can not only be used for selective compilation to reduce code-size and installation time, but can also be used for focussed optimisation on vital code regions of an application in order to improve overall performance. With selective compilation applied to a set of popular Android applications, code-size can be reduced by 49.9% on average, while installation time can be reduced by 31.8%, with only an average 8.5% increase in the amount of sequential runtime required to execute the collected profiles. The thesis also shows that, among the tested users, the use of a crowd-sourced and merged profile does not significantly affect their estimated performance loss from selective compilation (0.90x-0.92x) in comparison to when they they perform selective compilation with their own unique profile (0.93x). Furthermore, by proposing a new, more powerful code generator for Android’s virtual machine, these same profiles can be used to perform focussed optimisation, which preliminary results show to increase runtime performance across a set of common Android benchmarks by 1.46x-10.83x. Finally, in such a situation where a new code generator is being added to a virtual machine, it is also important to test the code generator for correctness and robustness. The methods of execution of a virtual machine, such as interpreters and code generators, must share a set of semantics about how programs must be executed, and this can be exploited in order to improve testing. This is done through the application of domain-aware binary fuzzing and differential testing within Android’s virtual machine. The thesis highlights a series of actual code generation and verification bugs that were found in Android’s virtual machine using this testing methodology, as well as comparing the proposed approach to other state-of-the-art fuzzing techniques

    Application of lean scheduling and production control in non-repetitive manufacturing systems using intelligent agent decision support

    Get PDF
    This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Lean Manufacturing (LM) is widely accepted as a world-class manufacturing paradigm, its currency and superiority are manifested in numerous recent success stories. Most lean tools including Just-in-Time (JIT) were designed for repetitive serial production systems. This resulted in a substantial stream of research which dismissed a priori the suitability of LM for non-repetitive non-serial job-shops. The extension of LM into non-repetitive production systems is opposed on the basis of the sheer complexity of applying JIT pull production control in non-repetitive systems fabricating a high variety of products. However, the application of LM in job-shops is not unexplored. Studies proposing the extension of leanness into non-repetitive production systems have promoted the modification of pull control mechanisms or reconfiguration of job-shops into cellular manufacturing systems. This thesis sought to address the shortcomings of the aforementioned approaches. The contribution of this thesis to knowledge in the field of production and operations management is threefold: Firstly, a Multi-Agent System (MAS) is designed to directly apply pull production control to a good approximation of a real-life job-shop. The scale and complexity of the developed MAS prove that the application of pull production control in non-repetitive manufacturing systems is challenging, perplex and laborious. Secondly, the thesis examines three pull production control mechanisms namely, Kanban, Base Stock and Constant Work-in-Process (CONWIP) which it enhances so as to prevent system deadlocks, an issue largely unaddressed in the relevant literature. Having successfully tested the transferability of pull production control to non-repetitive manufacturing, the third contribution of this thesis is that it uses experimental and empirical data to examine the impact of pull production control on job-shop performance. The thesis identifies issues resulting from the application of pull control in job-shops which have implications for industry practice and concludes by outlining further research that can be undertaken in this direction

    Transparent and Precise Malware Analysis Using Virtualization: From Theory to Practice

    Get PDF
    Dynamic analysis is an important technique used in malware analysis and is complementary to static analysis. Thus far, virtualization has been widely adopted for building fine-grained dynamic analysis tools and this trend is expected to continue. Unlike User/Kernel space malware analysis platforms that essentially co-exist with malware, virtualization based platforms benefit from isolation and fine-grained instrumentation support. Isolation makes it more difficult for malware samples to disrupt analysis and fine-grained instrumentation provides analysts with low level details, such as those at the machine instruction level. This in turn supports the development of advanced analysis tools such as dynamic taint analysis and symbolic execution for automatic path exploration. The major disadvantage of virtualization based malware analysis is the loss of semantic information, also known as the semantic gap problem. To put it differently, since analysis takes place at the virtual machine monitor where only the raw system state (e.g., CPU and memory) is visible, higher level constructs such as processes and files must be reconstructed using the low level information. The collection of techniques used to bridge semantic gaps is known as Virtual Machine Introspection. Virtualization based analysis platforms can be further separated into emulation and hardware virtualization. Emulators have the advantages of flexibility of analysis tool development and efficiency for fine-grained analysis; however, emulators suffer from the transparency problem. That is, malware can employ methods to determine whether it is executing in an emulated environment versus real hardware and cease operations to disrupt analysis if the machine is emulated. In brief, emulation based dynamic analysis has advantages over User/Kernel space and hardware virtualization based techniques, but it suffers from semantic gap and transparency problems. These problems have been exacerbated by recent discoveries of anti-emulation malware that detects emulators and Android malware with two semantic gaps, Java and native. Also, it is foreseeable that malware authors will have a similar response to taint analysis. In other words, once taint analysis becomes widely used to understand how malware operates, the authors will create new malware that attacks the imprecisions in taint analysis implementations and induce false-positives and false-negatives in an effort to frustrate analysts. This dissertation addresses these problems by presenting concepts, methods and techniques that can be used to transparently and precisely analyze both desktop and mobile malware using virtualization. This is achieved in three parts. First, precise heterogeneous record and replay is presented as a means to help emulators benefit from the transparency characteristics of hardware virtualization. This technique is implemented in a tool called V2E that uses KVM for recording and TEMU for replaying and analysis. It was successfully used to analyze real-world anti-emulation malware that evaded analysis using TEMU alone. Second, the design of an emulation based Android malware analysis platform that uses virtual machine introspection to bridge both the Java and native level semantic gaps as well as seamlessly bind the two views together into a single view is presented. The core introspection and instrumentation techniques were implemented in a new analysis platform called DroidScope that is based on the Android emulator. It was successfully used to analyze two real-world Android malware samples that have cooperating Java and native level components. Taint analysis was also used to study their information ex-filtration behaviors. Third, formal methods for studying the sources of false-positives and false-negatives in dynamic taint analysis designs and for verifying the correctness of manually defined taint propagation rules are presented. These definitions and methods were successfully used to analyze and compare previously published taint analysis platforms in terms of false-positives and false-negatives

    Supporting dynamic aspect-oriented features

    Get PDF
    Aspect-oriented programming techniques extend object-oriented programming with new methods to modularize concerns that otherwise would be non-modular. For example, logging concerns are typically scattered across a system but using aspect-oriented techniques they can be localized into a single high-level module. These techniques typically take modular high-level code and statically transform it into non-modular intermediate code. The contribution of this work is the design and implementation of a flexible and dynamic intermediate-language (IL) model. The main motivation for the design of this IL model is to support a variety of dynamic aspect-oriented language constructs that are proposed in recent literature such as CaeserJ\u27s deploy, history-based pointcuts, and control flow constructs. Our IL model provides a higher level of abstraction compared to traditional object-oriented ILs as a compilation target for such constructs, which makes it easier to provide efficient implementations of these constructs. We demonstrate these benefits by providing an industrial strength implementation for our IL model, by showing translation strategies from dynamic source-level constructs to our improved IL, and by analyzing the performance of the resulting IL code. Our evaluation using the SPEC JVM98 and Java Grande benchmarks shows that the overhead of supporting a dynamic deployment model can be reduced to as little as ~1.5%, when compared to the unmodified VM

    On the Challenges of Software Performance Optimization with Statistical Methods

    Get PDF
    Most recent programming languages, such as Java, Python and Ruby, include a collection framework as part of their standard library (or runtime). The Java Collection Framework provides a number of collection classes, some of which implement the same abstract data type, making them interchangeable. Developers can therefore choose between several functionally equivalent options. Since collections have different performance characteristics, and can be allocated in thousands of programs locations, the choice of collection has an important impact on performance. Unfortunately, programmers often make sub-optimal choices when selecting their collections.In this thesis, we consider the problem of building automated tools that would help the programmer choose between different collection implementations. We divide this problem into two sub-problems. First, we need to measure the performance of a collection, and use relevant statistical methods to make meaningful comparisons. Second, we need to predict the performance of a collection with as little benchmarking as possible.To measure and analyze the performance of Java collections, we identify problems with the established methods, and suggest the need for more appropriate statistical methods, borrowed from Bayesian statistics. We use these statistical methods in a reproduction of two state-of-the-art dynamic collection selection approaches: CoCo and CollectionSwitch. Our Bayesian approach allows us to make sound comparisons between the previously reported results and our own experimental evidence.We find that we cannot reproduce the original results, and report on possible causes for the discrepancies between our results and theirs.To predict the performance of a collection, we consider an existing tool called Brainy. Brainy suggests collections to developers for C++ programs, using machine learning. One particularity of Brainy is that it generates its own training data, by synthesizing programs and benchmarking them. As a result Brainy can automatically learn about new collections and new CPU architectures, while other approaches required an expert to teach the system about collection performance. We adapt Brainy to the Java context, and investigate whether Brainy's adaptability also holds for Java. We find that Brainy's benchmark synthesis methods do not apply well to the Java context, as they introduce some significant biases. We propose a new generative model for collection benchmarks and present the challenges that porting Brainy to Java entails

    A Co-Processor Approach for Efficient Java Execution in Embedded Systems

    Get PDF
    This thesis deals with a hardware accelerated Java virtual machine, named REALJava. The REALJava virtual machine is targeted for resource constrained embedded systems. The goal is to attain increased computational performance with reduced power consumption. While these objectives are often seen as trade-offs, in this context both of them can be attained simultaneously by using dedicated hardware. The target level of the computational performance of the REALJava virtual machine is initially set to be as fast as the currently available full custom ASIC Java processors. As a secondary goal all of the components of the virtual machine are designed so that the resulting system can be scaled to support multiple co-processor cores. The virtual machine is designed using the hardware/software co-design paradigm. The partitioning between the two domains is flexible, allowing customizations to the resulting system, for instance the floating point support can be omitted from the hardware in order to decrease the size of the co-processor core. The communication between the hardware and the software domains is encapsulated into modules. This allows the REALJava virtual machine to be easily integrated into any system, simply by redesigning the communication modules. Besides the virtual machine and the related co-processor architecture, several performance enhancing techniques are presented. These include techniques related to instruction folding, stack handling, method invocation, constant loading and control in time domain. The REALJava virtual machine is prototyped using three different FPGA platforms. The original pipeline structure is modified to suit the FPGA environment. The performance of the resulting Java virtual machine is evaluated against existing Java solutions in the embedded systems field. The results show that the goals are attained, both in terms of computational performance and power consumption. Especially the computational performance is evaluated thoroughly, and the results show that the REALJava is more than twice as fast as the fastest full custom ASIC Java processor. In addition to standard Java virtual machine benchmarks, several new Java applications are designed to both verify the results and broaden the spectrum of the tests.Siirretty Doriast
    • 

    corecore