175 research outputs found
A selective dynamic compiler for embedded Java virtual machine targeting ARM processors
Tableau dâhonneur de la FacultĂ© des Ă©tudes supĂ©rieures et postdoctorales, 2004-2005Ce travail prĂ©sente une nouvelle technique de compilation dynamique sĂ©lective pour les systĂšmes embarquĂ©s avec processeurs ARM. Ce compilateur a Ă©tĂ© intĂ©grĂ© dans la plateforme J2ME/CLDC (Java 2 Micro Edition for Connected Limited Device Con- figuration). Lâobjectif principal de notre travail est dâobtenir une machine virtuelle accĂ©lĂ©rĂ©e, lĂ©gĂšre et compacte prĂȘte pour lâexĂ©cution sur les systĂšmes embarquĂ©s. Cela est atteint par lâimplĂ©mentation dâun compilateur dynamique sĂ©lectif pour lâarchitecture ARM dans la Kilo machine virtuelle de Sun (KVM). Ce compilateur est appelĂ© Armed E-Bunny. PremiĂšrement, on prĂ©sente la plateforme Java, le Java 2 Micro Edition(J2ME) pour les systĂšmes embarquĂ©s et les composants de la machine virtuelle Java. Ensuite, on discute les diffĂ©rentes techniques dâaccĂ©lĂ©ration pour la machine virtuelle Java et on dĂ©taille le principe de la compilation dynamique. Enfin, on illustre lâarchitecture, le design (la conception), lâimplĂ©mentation et les rĂ©sultats expĂ©rimentaux de notre compilateur dynamique sĂ©lective Armed E-Bunny. La version modifiĂ©e de KVM a Ă©tĂ© portĂ©e sur un ordinateur de poche (PDA) et a Ă©tĂ© testĂ©e en utilisant un benchmark standard de J2ME. Les rĂ©sultats expĂ©rimentaux de la performance montrent une accĂ©lĂ©ration de 360 % par rapport Ă la derniĂšre version de la KVM de Sun avec un espace mĂ©moire additionnel qui nâexcĂšde pas 119 kilobytes.This work presents a new selective dynamic compilation technique targeting ARM 16/32-bit embedded system processors. This compiler is built inside the J2ME/CLDC (Java 2 Micro Edition for Connected Limited Device Configuration) platform. The primary objective of our work is to come up with an efficient, lightweight and low-footprint accelerated Java virtual machine ready to be executed on embedded machines. This is achieved by implementing a selective ARM dynamic compiler called Armed E-Bunny into Sunâs Kilobyte Virtual Machine (KVM). We first present the Java platform, Java 2 Micro Edition (J2ME) for embedded systems and Java virtual machine components. Then, we discuss the different acceleration techniques for Java virtual machine and we detail the principle of dynamic compilation. After that we illustrate the architecture, design, implementation and experimental results of our selective dynamic compiler Armed E-Bunny. The modified KVM is ported on a handheld PDA and is tested using standard J2ME benchmarks. The experimental results on its performance demonstrate that a speedup of 360% over the last version of Sunâs KVM is accomplished with a footprint overhead that does not exceed 119 kilobytes
Recommended from our members
Capability Memory Protection for Embedded Systems
This dissertation explores the use of capability security hardware and software in real-time and latency-sensitive embedded systems, to address existing memory safety and task isolation problems as well as providing new means to design a secure and scalable real-time system.
In addition, this dissertation looks into how practical and high-performance temporal memory safety can be achieved under a capability architecture.
State-of-the-art memory protection schemes for embedded systems typically present limited and inflexible solutions to memory protection and isolation, and fail to scale as embedded devices become more capable and ubiquitous.
I investigate whether a capability architecture is able to provide new angles to address memory safety issues in an embedded scenario.
Previous CHERI capability research focuses on 64-bit architectures in UNIX operating systems, which does not translate to typical 32-bit embedded processors with low-latency and real-time requirements.
I propose and implement the CHERI CC-64 encoding and the CHERI-64 coprocessor to construct a feasible capability-enabled 32-bit CPU.
In addition, I implement a real-time kernel for embedded systems atop CHERI-64.
On this hardware and software platform, I focus on exploring scalable task isolation and fine-grained memory protection enabled by capabilities in a single flat physical address space, which are otherwise difficult or impossible to achieve via state-of-the-art approaches.
Later, I present the evaluation of the hardware implementation and the software run-time overhead and real-time performance.
Even with capability support, CHERI-64 as well as other CHERI processors still expose major attack surfaces through temporal vulnerabilities like use-after-free.
A naive approach that sweeps memory to invalidate stale capabilities is inefficient and incurs significant cycle overhead and DRAM traffic.
To make sweeping revocation feasible, I introduce new architectural mechanisms and micro-architectural optimisations to substantially reduce the cost of memory sweeping and capability revocation.
Another factor of the cost is the frequency of memory sweeping.
I explore tradeoffs of memory allocator designs that use quarantine buffers and shadow space tags to prevent frequent unnecessary sweeping.
The evaluation shows that the optimisations and new allocator designs reduce the cost of capability sweeping revocation by orders of magnitude, making it already practical for most applications to adopt temporal safety under CHERI.CSC Cambridge Scholarshi
Cooperative cache scrubbing
Managing the limited resources of power and memory bandwidth while improving performance on multicore hardware is challeng-ing. In particular, more cores demand more memory bandwidth, and multi-threaded applications increasingly stress memory sys-tems, leading to more energy consumption. However, we demon-strate that not all memory traffic is necessary. For modern Java pro-grams, 10 to 60 % of DRAM writes are useless, because the data on these lines are dead- the program is guaranteed to never read them again. Furthermore, reading memory only to immediately zero ini-tialize it wastes bandwidth. We propose a software/hardware coop-erative solution: the memory manager communicates dead and zero lines with cache scrubbing instructions. We show how scrubbing instructions satisfy MESI cache coherence protocol invariants and demonstrate them in a Java Virtual Machine and multicore simula-tor. Scrubbing reduces average DRAM traffic by 59%, total DRAM energy by 14%, and dynamic DRAM energy by 57 % on a range of configurations. Cooperative software/hardware cache scrubbing reduces memory bandwidth and improves energy efficiency, two critical problems in modern systems
Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from Concrete Concurrency Models
The upcoming many-core architectures require software developers to exploit
concurrency to utilize available computational power. Today's high-level
language virtual machines (VMs), which are a cornerstone of software
development, do not provide sufficient abstraction for concurrency concepts. We
analyze concrete and abstract concurrency models and identify the challenges
they impose for VMs. To provide sufficient concurrency support in VMs, we
propose to integrate concurrency operations into VM instruction sets.
Since there will always be VMs optimized for special purposes, our goal is to
develop a methodology to design instruction sets with concurrency support.
Therefore, we also propose a list of trade-offs that have to be investigated to
advise the design of such instruction sets.
As a first experiment, we implemented one instruction set extension for
shared memory and one for non-shared memory concurrency. From our experimental
results, we derived a list of requirements for a full-grown experimental
environment for further research
Real-Time Operating Systems and Programming Languages for Embedded Systems
In this chapter, we present the different alternatives that are available today for the development of real-time embedded systems. In particular, we will focus on the programming languages use like C++, Java and Ada and the operating systems like Linux-RT, FreeRTOS, TinyOS, etc. In particular we will analyze the actual state of the art for developing embedded systems under the WORA paradigm with standard Java [1], its Real-Time Specification and with the use of Real-Time Core Extensions and pico Java based CPUs [5]. We expect the reader to have a clear view of the opportunities present at the moment of starting a design with its pros and cons so it can choose the best one to fit its case.Fil: Orozco, Javier Dario. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas. Centro CientĂfico TecnolĂłgico Conicet - BahĂa Blanca. Instituto de Investigaciones en IngenierĂa ElĂ©ctrica "Alfredo Desages". Universidad Nacional del Sur. Departamento de IngenierĂa ElĂ©ctrica y de Computadoras. Instituto de Investigaciones en IngenierĂa ElĂ©ctrica "Alfredo Desages"; Argentina. Universidad Nacional del Sur. Departamento de IngenierĂa ElĂ©ctrica y de Computadoras. Laboratorio de Sistemas Digitales; ArgentinaFil: Santos, Rodrigo Martin. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas. Centro CientĂfico TecnolĂłgico Conicet - BahĂa Blanca. Instituto de Investigaciones en IngenierĂa ElĂ©ctrica "Alfredo Desages". Universidad Nacional del Sur. Departamento de IngenierĂa ElĂ©ctrica y de Computadoras. Instituto de Investigaciones en IngenierĂa ElĂ©ctrica "Alfredo Desages"; Argentina. Universidad Nacional del Sur. Departamento de IngenierĂa ElĂ©ctrica y de Computadoras. Laboratorio de Sistemas Digitales; Argentin
Exploiting the Weak Generational Hypothesis for Write Reduction and Object Recycling
Programming languages with automatic memory management are continuing to grow in popularity due to ease of programming. However, these languages tend to allocate objects excessively, leading to inefficient use of memory and large garbage collection and allocation overheads.
The weak generational hypothesis notes that objects tend to die young in languages with automatic dynamic memory management. Much work has been done to optimize allocation and garbage collection algorithms based on this observation. Previous work has largely focused on developing efficient software algorithms for allocation and collection. However, much less work has studied architectural solutions. In this work, we propose and evaluate architectural support for assisting allocation and garbage collection.
We first study the effects of languages with automatic memory management on the memory system. As objects often die young, it is likely many objects die while in the processor\u27s caches. Writes of dead data back to main memory are unnecessary, as the data will never be used again. To study this, we develop and present architecture support to identify dead objects while they remain resident in cache and eliminate any unnecessary writes. We show that many writes out of the caches are unnecessary, and can be avoided using our hardware additions.
Next, we study the effects of using dead data in cache to assist with allocation and garbage collection. Logic is developed and presented to allow for reuse of cache space found dead to satisfy future allocation requests. We show that dead cache space can be recycled at a high rate, reducing pressure on the allocator and reducing cache miss rates. However, a full implementation of our initial approach is shown to be unscalable. We propose and study limitations to our approach, trading object coverage for scalability.
Third, we present a new approach for identifying objects that die young based on a limitation of our previous approach. We show this approach has much lower storage and logic requirements and is scalable, while only slightly decreasing overall object coverage
- âŠ