911 research outputs found
Code Cache Management in Managed Language VMs to Reduce Memory Consumption for Embedded Systems
The compiled native code generated by a just-in-time (JIT) compiler in man- aged language virtual machines (VM) is placed in a region of memory called the code cache. Code cache management (CCM) in a VM is responsible to find and evict methods from the code cache to maintain execution correctness and manage program performance for a given code cache size or memory budget. Effective CCM can also boost program speed by enabling more aggressive JIT compilation, powerful optimizations, and improved hardware instruction cache and I-TLB per- formance. Though important, CCM is an overlooked component in VMs. We find that the default CCM policies in Oracle’s production-grade HotSpot VM perform poorly even at modest memory pressure. We develop a detailed simulation-based frame- work to model and evaluate the potential efficiency of many different CCM poli- cies in a controlled and realistic, but VM-independent environment. We make the encouraging discovery that effective CCM policies can sustain high program performance even for very small cache sizes. Our simulation study provides the rationale and motivation to improve CCM strategies in existing VMs. We implement and study the properties of several CCM policies in HotSpot. We find that in spite of working within the bounds of the HotSpot VM’s current CCM sub-system, our best CCM policy implementation in HotSpot improves program performance over the default CCM algorithm by 39%, 41%, 55%, and 50% with code cache sizes that are 90%, 75%, 50%, and 25% of the desired cache size, on average
Effective memory management for mobile environments
Smartphones, tablets, and other mobile devices exhibit vastly different constraints compared to regular or classic computing environments like desktops, laptops, or servers. Mobile devices run dozens of so-called “apps” hosted by independent virtual machines (VM). All these VMs run concurrently and each VM deploys purely local heuristics to organize resources like memory, performance, and power. Such a design causes conflicts across all layers of the software stack, calling for the evaluation of VMs and the optimization techniques specific for mobile frameworks.
In this dissertation, we study the design of managed runtime systems for mobile platforms. More specifically, we deepen the understanding of interactions between garbage collection (GC) and system layers. We develop tools to monitor the memory behavior of Android-based apps and to characterize GC performance, leading to the development of new techniques for memory management that address energy constraints, time performance, and responsiveness.
We implement a GC-aware frequency scaling governor for Android devices. We also explore the tradeoffs of power and performance in vivo for a range of realistic GC variants, with established benchmarks and real applications running on Android virtual machines. We control for variation due to dynamic voltage and frequency scaling (DVFS), Just-in-time (JIT) compilation, and across established dimensions of heap memory size and concurrency. Finally, we provision GC as a global service that collects statistics from all running VMs and then makes an informed decision that optimizes across all them (and not just locally), and across all layers of the stack.
Our evaluation illustrates the power of such a central coordination service and garbage collection mechanism in improving memory utilization, throughput, and adaptability to user activities. In fact, our techniques aim at a sweet spot, where total on-chip energy is reduced (20–30%) with minimal impact on throughput and responsiveness (5–10%). The simplicity and efficacy of our approach reaches well beyond the usual optimization techniques
lLTZVisor: a lightweight TrustZone-assisted hypervisor for low-end ARM devices
Dissertação de mestrado em Engenharia Eletrónica Industrial e ComputadoresVirtualization is a well-established technology in the server and desktop space
and has recently been spreading across different embedded industries. Facing
multiple challenges derived by the advent of the Internet of Things (IoT) era,
these industries are driven by an upgrowing interest in consolidating and isolating
multiple environments with mixed-criticality features, to address the complex IoT
application landscape. Even though this is true for majority mid- to high-end
embedded applications, low-end systems still present little to no solutions proposed
so far.
TrustZone technology, designed by ARM to improve security on its processors,
was adopted really well in the embedded market. As such, the research community
became active in exploring other TrustZone’s capacities for isolation, like
an alternative form of system virtualization. The lightweight TrustZone-assisted
hypervisor (LTZVisor), that mainly targets the consolidation of mixed-criticality
systems on the same hardware platform, is one design example that takes advantage
of TrustZone technology for ARM application processors. With the recent
introduction of this technology to the new generation of ARM microcontrollers, an
opportunity to expand this breakthrough form of virtualization to low-end devices
arose.
This work proposes the development of the lLTZVisor hypervisor, a refactored
LTZVisor version that aims to provide strong isolation on resource-constrained
devices, while achieving a low-memory footprint, determinism and high efficiency.
The key for this is to implement a minimal, reliable, secure and predictable virtualization
layer, supported by the TrustZone technology present on the newest
generation of ARM microcontrollers (Cortex-M23/33).Virtualização é uma tecnologia já bem estabelecida no âmbito de servidores e
computadores pessoais que recentemente tem vindo a espalhar-se através de várias
indústrias de sistemas embebidos. Face aos desafios provenientes do surgimento
da era Internet of Things (IoT), estas indústrias são guiadas pelo crescimento
do interesse em consolidar e isolar múltiplos sistemas com diferentes níveis de
criticidade, para atender ao atual e complexo cenário aplicativo IoT. Apesar de
isto se aplicar à maioria de aplicações embebidas de média e alta gama, sistemas
de baixa gama apresentam-se ainda com poucas soluções propostas.
A tecnologia TrustZone, desenvolvida pela ARM de forma a melhorar a segurança
nos seus processadores, foi adoptada muito bem pelo mercado dos sistemas embebidos.
Como tal, a comunidade científica começou a explorar outras aplicações
da tecnologia TrustZone para isolamento, como uma forma alternativa de virtualização
de sistemas. O "lightweight TrustZone-assisted hypervisor (LTZVisor)",
que tem sobretudo como fim a consolidação de sistemas de criticidade mista na
mesma plataforma de hardware, é um exemplo que tira vantagem da tecnologia
TrustZone para os processadores ARM de alta gama. Com a recente introdução
desta tecnologia para a nova geração de microcontroladores ARM, surgiu uma
oportunidade para expandir esta forma inovadora de virtualização para dispositivos
de baixa gama.
Este trabalho propõe o desenvolvimento do hipervisor lLTZVisor, uma versão
reestruturada do LTZVisor que visa em proporcionar um forte isolamento em dispositivos
com recursos restritos, simultâneamente atingindo um baixo footprint de
memória, determinismo e alta eficiência. A chave para isto está na implementação
de uma camada de virtualização mínima, fiável, segura e previsível, potencializada
pela tecnologia TrustZone presente na mais recente geração de microcontroladores
ARM (Cortex-M23/33)
Dynamic Binary Translation for Embedded Systems with Scratchpad Memory
Embedded software development has recently changed with advances in computing. Rather than fully co-designing software and hardware to perform a relatively simple task, nowadays embedded and mobile devices are designed as a platform where multiple applications can be run, new applications can be added, and existing applications can be updated. In this scenario, traditional constraints in embedded systems design (i.e., performance, memory and energy consumption and real-time guarantees) are more difficult to address. New concerns (e.g., security) have become important and increase software complexity as well.
In general-purpose systems, Dynamic Binary Translation (DBT) has been used to address these issues with services such as Just-In-Time (JIT) compilation, dynamic optimization, virtualization, power management and code security. In embedded systems, however, DBT is not usually employed due to performance, memory and power overhead.
This dissertation presents StrataX, a low-overhead DBT framework for embedded systems. StrataX addresses the challenges faced by DBT in embedded systems using novel techniques. To reduce DBT overhead, StrataX loads code from NAND-Flash storage and translates it into a Scratchpad Memory (SPM), a software-managed on-chip SRAM with limited capacity. SPM has similar access latency as a hardware cache, but consumes less power and chip area.
StrataX manages SPM as a software instruction cache, and employs victim compression and pinning to reduce retranslation cost and capture frequently executed code in the SPM. To prevent performance loss due to excessive code expansion, StrataX minimizes the amount of code inserted by DBT to maintain control of program execution. When a hardware instruction cache is available, StrataX dynamically partitions translated code among the SPM and main memory. With these techniques, StrataX has low performance overhead relative to native execution for MiBench programs. Further, it simplifies embedded software and hardware design by operating transparently to applications without any special hardware support. StrataX achieves sufficiently low overhead to make it feasible to use DBT in embedded systems to address important design goals and requirements
Proxy compilation for Java via a code migration technique
There is an increasing trend that intermediate representations (IRs) are used to deliver programs in more and more languages, such as Java. Although Java can provide many advantages, including a wider portability and better optimisation opportunities on execution, it introduces extra overhead by requiring an IR translation for the program execution. For maximum execution performance, an optimising compiler is placed in the runtime to selectively optimise code regions regarded as “hotspots”. This common approach has been effectively deployed in many implementation of programming languages. However, the computational resources demanded by this approach made it less efficient, or even difficult to deploy directly in a resourceconstrained environment. One implementation approach is to use a remote compilation technique to support compilation during the execution. The work presented in this dissertation supports the thesis that execution performance can be improved by the use of efficient optimising compilation by using a proxy dynamic optimising compiler. After surveying various approaches to the design and implementation of remote compilation, a proxy compilation system called Apus is defined. To demonstrate the effectiveness of using a dynamic optimising compiler as a proxy compiler, a complete proxy compilation system is written based on a research-oriented Java VirtualMachine (JVM). The proxy compilation system is discussed in detail, showing how to deliver remote binaries and manage a cache of binaries by using a code migration approach. The proxy compilation client shows how the proxy compilation service is integrated with the selective optimisation system to maximise execution performance. The results of empirical measurements of the system are given, showing the efficiency of code optimisation from either the proxy compilation service and a local binary cache. The conclusion of this work is that Java execution performance can be improved by efficient optimising compilation with a proxy compilation service by using a code migration technique
Exploring Dynamic Compilation and Cross-Layer Object Management Policies for Managed Language Applications
Recent years have witnessed the widespread adoption of managed programming languages that are designed to execute on virtual machines. Virtual machine architectures provide several powerful software engineering advantages over statically compiled binaries, such as portable program representations, additional safety guarantees, automatic memory and thread management, and dynamic program composition, which have largely driven their success. To support and facilitate the use of these features, virtual machines implement a number of services that adaptively manage and optimize application behavior during execution. Such runtime services often require tradeoffs between efficiency and effectiveness, and different policies can have major implications on the system's performance and energy requirements. In this work, we extensively explore policies for the two runtime services that are most important for achieving performance and energy efficiency: dynamic (or Just-In-Time (JIT)) compilation and memory management. First, we examine the properties of single-tier and multi-tier JIT compilation policies in order to find strategies that realize the best program performance for existing and future machines. Our analysis performs hundreds of experiments with different compiler aggressiveness and optimization levels to evaluate the performance impact of varying if and when methods are compiled. We later investigate the issue of how to optimize program regions to maximize performance in JIT compilation environments. For this study, we conduct a thorough analysis of the behavior of optimization phases in our dynamic compiler, and construct a custom experimental framework to determine the performance limits of phase selection during dynamic compilation. Next, we explore innovative memory management strategies to improve energy efficiency in the memory subsystem. We propose and develop a novel cross-layer approach to memory management that integrates information and analysis in the VM with fine-grained management of memory resources in the operating system. Using custom as well as standard benchmark workloads, we perform detailed evaluation that demonstrates the energy-saving potential of our approach. We implement and evaluate all of our studies using the industry-standard Oracle HotSpot Java Virtual Machine to ensure that our conclusions are supported by widely-used, state-of-the-art runtime technology
Leveraging virtualization technologies for resource partitioning in mixed criticality systems
Multi- and many-core processors are becoming increasingly popular in embedded systems. Many of these processors now feature hardware virtualization capabilities, such as the ARM Cortex A15, and x86 processors with Intel VT-x or AMD-V support. Hardware virtualization offers opportunities to partition physical resources, including processor cores, memory and I/O devices amongst guest virtual machines. Mixed criticality systems and services can then co-exist on the same platform in separate virtual machines. However, traditional virtual machine systems are too expensive because of the costs of trapping into hypervisors to multiplex and manage machine physical resources on behalf of separate guests. For example, hypervisors are needed to schedule separate VMs on physical processor cores. Additionally, traditional hypervisors have memory footprints that are often too large for many embedded computing systems. This dissertation presents the design of the Quest-V separation kernel, which partitions services of different criticality levels across separate virtual machines, or sandboxes. Each sandbox encapsulates a subset of machine physical resources that it manages without requiring intervention of a hypervisor. In Quest-V, a hypervisor is not needed for normal operation, except to bootstrap the system and establish communication channels between sandboxes. This approach not only reduces the memory footprint of the most privileged protection domain, it removes it from the control path during normal system operation, thereby heightening security
Automated Verification of Practical Garbage Collectors
Garbage collectors are notoriously hard to verify, due to their low-level
interaction with the underlying system and the general difficulty in reasoning
about reachability in graphs. Several papers have presented verified
collectors, but either the proofs were hand-written or the collectors were too
simplistic to use on practical applications. In this work, we present two
mechanically verified garbage collectors, both practical enough to use for
real-world C# benchmarks. The collectors and their associated allocators
consist of x86 assembly language instructions and macro instructions, annotated
with preconditions, postconditions, invariants, and assertions. We used the
Boogie verification generator and the Z3 automated theorem prover to verify
this assembly language code mechanically. We provide measurements comparing the
performance of the verified collector with that of the standard Bartok
collectors on off-the-shelf C# benchmarks, demonstrating their competitiveness
- …