Search CORE

11 research outputs found

A Multilevel Introspective Dynamic Optimization System For Holistic Power-Aware Computing

Author: Franz Michael
Probst Christian
Venkatachalam Vasanth
Publication venue
Publication date: 01/01/2005
Field of study

Power consumption is rapidly becoming the dominant limiting factor for further improvements in computer design. Curiously, this applies both at the "high end" of workstations and servers and the "low end" of handheld devices and embedded computers. At the high-end, the challenge lies in dealing with exponentially growing power densities. At the low-end, there is a demand to make mobile devices more powerful and longer lasting, but battery technology is not improving at the same rate that power consumption is rising. Traditional power-management research is fragmented; techniques are being developed at specific levels, without fully exploring their synergy with other levels. Most software techniques target either operating systems or compilers but do not explore the interaction between the two layers. These techniques also have not fully explored the potential of virtual machines for power management. In contrast, we are developing a system that integrates information from multiple levels of software and hardware, connecting these levels through a communication channel. At the heart of this system are a virtual machine that compiles and dynamically profiles code, and an optimizer that reoptimizes all code, including that of applications and the virtual machine itself. We believe this introspective, holistic approach enables more informed power-management decisions

Dagstuhl Research Online Publication Server

Online Research Database In Technology

Crystal gazer : profile-driven write-rationing garbage collection for hybrid memories

Author: Akram Shoaib
Eeckhout Lieven
McKinley Kathryn
Sartor Jennifer
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Non-volatile memories (NVM) offer greater capacity than DRAM but suffer from high latency and low write endurance. Hybrid memories combine DRAM and NVM to form scalable memory systems with the promise of high capacity, low energy consumption, and high endurance. Automatically managing hybrid NVM-DRAM memories to achieve their promise without changing user applications or their programming models remains an open question. This paper uses garbage collection in managed languages to exploit NVM capacity while preventing NVM wear out in hybrid memories with no changes to the programming model. We introduce profile-driven write-rationing garbage collection. Allocation sites that produce frequently written objects are predicted based on previous program executions. Objects are initially allocated in a DRAM nursery space. The collector copies surviving nursery objects from highly written sites to a mature DRAM space and read-mostly objects to a mature NVM space.Write-intensity prediction for 15 Java benchmarks accurately places objects in the correct space, eliminating expensive object monitoring from prior write-rationing garbage collectors. Furthermore, our technique exposes a Pareto tradeoff between DRAM usage and NVM lifetime, unlike prior work. Experimental results on NUMA hardware that emulates hybrid NVM-DRAM memory demonstrates that profile-driven write-rationing garbage collection reduces the number of writes to NVM compared to prior work to extend its lifetime, maximizes the use of NVM for its capacity, and achieves good performance

Ghent University Academic Bibliography

Recommended from our members

Design, Implementation, and Evaluation of a Compilation Server ; CU-CS-978-04

Author: Diwan Amer
Lee Han
Moss Eliot
Publication venue: CU Scholar
Publication date: 01/05/2004
Field of study

CU Scholar Institutional Repository

Phase-based adaptive recompilation in a JVM

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

Crossref

Compilação Just-In-Time: Histórico, Arquitetura, Princípios e Sistemas

Author: da Silva Anderson Faustino
Oliveira George Souza
Publication venue: 'Universidade Federal do Rio Grande do Sul'
Publication date: 13/05/2013
Field of study

Diversas implementações de linguagens de alto nível focam no desenvolvimento de sistemas baseados em mecanismos de compilação just-in-time. Esse mecanismo possui o atrativo de melhorar o desempenho de tais linguagens, mantendo a portabilidade. Contudo, ao preço da inclusão do tempo de compilação ao tempo total de execução. Diante disso, as pesquisas na área têm voltado balancear o custo de compilação com eficiência de execução. Os primeiros sistemas de compilação just-in-time empregavam estratégias estáticas para selecionar e otimizar as regiões de código propícias para gerar bom desempenho. Sistemas mais sofisticados aprimoraram tais estratégias com o objetivo de aplicar otimizações de forma mais criteriosa. Nesse sentido, este tutorial apresenta os princípios que fundamentam a compilação just-in-time e sua evolução ao longo dos anos, bem como a abordagem utilizada por diversos sistemas para garantir o balanceamento de custo e eficiência. Embora seja difícil definir a melhor abordagem, trabalhos recentes mostram que estratégias rígidas para detecção e otimização de código, juntamente com recursos de paralelismo oferecidos pelas arquiteturas multi-core formarão a base dos futuros sistemas de compilação just-in-time

Em Questao

Archives of the Faculty of Veterinary Medicine UFRGS

어플리케이션 다운로딩 시스템을 위한 클라이언트 선행 컴파일러

Author: 홍성현
Publication venue: 서울대학교 대학원
Publication date: 01/08/2014
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 8. 문수묵.어플리케이션(앱)을 다운로드 받아서 수행하는 시스템은 DTV나 스마트폰처럼 대중적으로 사용되고 있다. 앱을 다운받아서 사용하는 시스템들은 가상 머신을 주류로 사용하고 있다. 가상 머신의 가장 큰 문제점은 인터프리터를 통한 수행에 의한 느린 성능이며, 이 성능의 향상을 위해 주로 사용되는 기술이 적시 컴파일러이다. 적시 컴파일러는 다운받은 앱의 수행 중에 동적으로 머신 코드로 번역하여 사용하는 기법으로, 동적 컴파일레이션 오버헤드를 가지게 된다. 우리는 이 동적 컴파일레이션 오버헤드를 제거하여 성능을 향상시키는 클라이언트 선행 컴파일러를 제안하였다. 클라이언트 선행 컴파일러는 적시 컴파일러가 생성하는 머신 코드를 앱의 종료될 때 지우지 않고 파일형태로 스토리지에 저장하여 이후에 앱이 다시 수행될 때 저장한 머신 코드를 재활용하여 사용함으로써 런타임 컴파일레이션 오버헤드를 제거하게 된다. 저장한 머신 코드를 재활용할 때 머신 코드에 인코딩된 주소값들은 유효하지 않기 때문에 가상 머신의 현재 값에 맞추어 변경해주는 작업이 필요하다. 이 작업은 주소 재배치이다. 주소 재배치는 저장된 머신 코드만으로 수행할 수가 없기 때문에 추가적인 정보를 머신 코드를 저장하는 과정에서 생성하여 파일에 함께 저장해 주어야 한다. 자바의 상수 풀 해석은 주소 재배치 작업을 어렵게 한다. 우리는 이에 대한 해결책을 만들었다. 주소 재배치를 위한 정보들을 저장하기위해 영구 메모를 많이 사용하는 것도 문제가 된다. 따라서 우리는 주소 재배치 정보를 머신 코드 상에 인코딩하고 압축하여 저장하는 방법을 제안했다. 우리의 클라이언트 선행 컴파일러 기법은 오라클사의 CDC 가상머신 참조구현인 CVM에 구현하였다. 우리의 클라이언트 선행 컴파일러는 벤치마크의 성능을 약 12% 향상시켰다. 또한 우리는 클라이언트 선행 컴파일러 기법을 실제로 판매하는 DTV환경에 구축하여 실제 방송국이 사용하는 어플리케이션을 수행해 보았다. 우리의 클라이언트 선행 컴파일러 방식은 사용자의 실제 환경에서 33%의 좋은 성능 향상을 얻었다. 자바스크립트 가상머신인 구글사의 V8 가상머신은 인터프리터 수행없이 적시 컴파일러만을 사용하고 있다. 우리는 V8 가상 머신에 클라이언트 선행 컴파일러는 적용하였지만, 실제 성능 향상을 얻어내지는 못했다. 이것은 V8 가상 머신의 특징인 내부 객체의 적극적인 사용에 의한 결과이다. 내부 객체는 컴파일러가 생성하여 컴파일러 과정에서 사용되며, 자바스크립트 프로그램에서도 접근하여 사용하게 된다. V8 가상 머신의 컴포넌트들은 대부분 내부 객체로 생성되어, 다른 종류의 가상 머신에 비해서 상당히 많은 내부 객체를 생성하고 있다. V8의 적시 컴파일러가 생성하는 머신 코드에서는 이 내부 객체를 직접 접근하여 사용하게 되어, 클라이언트 선행 컴파일러에 의해 어플리케이션이 수행될 때마다 이 내부 객체는 항상 필요하기 때문에 클라이언트 선행 컴파일러는 내부 객체를 재생성해야만 한다. V8 적시 컴파일러의 런타임 컴파일레이션 오버헤드의 대부분이 내부 객체를 생성하는 오버헤드이기 때문에, 우리의 클라이언트 선행 컴파일러는 이 환경에서 충분한 성능 향상을 얻을 수 없었다.App-downloading systems like DTV and smart phone are popularly used. Virtual machine is mainstream for those systems. One critical problem of app-downloading systems is performance because app is executed by interpreter. A popular solution for improving performance is Just-In-Time Compiler (JITC). JITC compiles to machine code at runtime. So, JITC suffers from runtime compilation overhead. We suggested client Ahead-Of-Time Compiler(c-AOTC) which improves the performance by removing runtime compilation overhead. c-AOTC saves machine code of method generated by JITC in persistent storage and reuses it in next runs. The machine code of a method translated by JITC is cached on a persistent memory of the device, and when the method is invoked again in a later run of the program, the machine code is loaded and executed directly without any translation overhead. One major issue in c-AOTC is relocation because some of the address constants embedded in the cached machine code are not correct when the machine code is loaded and used in a different runthose addresses should be corrected before they are used. Constant pool resolution complicates the relocation problem, and we propose our solutions. The persistent memory overhead for saving the relocation information is also an issue, and we propose a technique to encode the relocation information and compress the machine code efficiently. We developed a c-AOTC on Oracles CDC VM, and evaluation results indicate that c-AOTC can improve the performance as much as an average of 12% for benchmarks. And we adopted c-AOTC approach to commercial DTV platform and test the real xlet applications of commercial broadcasting stations. c-AOTC got average 33% performance improvement on the real xlet application test. V8 JavaScript VM does not use interpreter. Apps are executed only by JITC. We adopted c-AOTC to V8 VM. But we cannot get any good performance result because of V8 VMs characteristics. V8 VM components are generated as internal objects. Internal objects are used for compiling and running of JavaScript program. The machine code of V8 VM addresses internal objects which are different for each run. Because internal objects be　accessed in each run, c-AOTC　must recreate those objects. Because most of compilation overhead of V8 VM is internal object creation overhead, c-AOTC　does not get enough improvements.Chapter 1 Introduction 1 Chatper 2 client-AOTC Approach 4 Chatper 3 Java Virtual Machine and Our JITC 9 3.1 Overview of JVM and the Bytecode 9 3.2 Our JITC on the CVM 14 Chatper 4 Design and Implementation of c-AOTC on JVM 16 4.1 Architecutre of the c-AOTC 16 4.2 Relocation 19 4.2.1 Translated Code Which Needs Relocation 19 4.2.2 Relocation Information and Relocation Process 22 4.2.3 Relocation for Inlined Methods 24 4.3 Reducing the Size of .aotc Files 25 4.3.1 Encoding Relocation Information 25 4.3.2 Machine Code Compression 27 4.3.3 Structure of the .aotc File 27 Chatper 5 c-AOTC for DTV JVM platform 29 5.1 DTV software platform 30 5.2 c-AOTC on the DTV 32 5.2.1 Design of c-AOTC on DTV 32 5.2.2 Relocation Problem 35 5.2.3 Example of Relocation 39 5.2.3.1 Relocation Example of JVM c-AOTC 39 5.2.3.3 Relocation Example of DTV c-AOTC 41 Chatper 6 c-AOTC for JavaScript VM 44 6.1 V8 JavaScript VM 44 6.2 Issue and Solution of c-AOTC on V8 JavaScript VM 46 Chatper 7 Experimental Results 51 7.1 Experimental Environment of JVM 51 7.2 Performance Impact of c-AOTC 53 7.3 Space Overhead of c-AOTC 55 7.4 Reducing Number of c-AOTC Methods 60 7.5 c-AOTC with new hot-spot detection heuristics 63 7.5.1 Performance Impact of c-AOTC with new hot-spotdetection heuristics 63 7.5.2 Space Overhead of c-AOTC with new hot-spot detection heuristics 67 7.6 c-AOTC of DTV JVM platform 70 7.6.1 Performance result of DTV platform 70 7.6.2 Analysis of JITCed method of DTV platform 72 7.6.3 Space overhead of DTV platform 74 7.6.4 c-AOTC overhead of DTV platform 75 7.6.5 c-AOTC performance using different xlets c-AOTC file in DTV platform 76 7.7 c-AOTC of V8 JavaScript engine 79 7.7.1 Compilation overhead on V8 JavaScript VM 79 7.7.2 Performance result on V8 JavaScript VM 81 7.7.3 Comparison with c-AOTC of JavaScriptCore VM 83 Chatper 8 Related Work 86 Chatper 9 Conclusion 89 Bibliography 91 초록 99Docto

SNU Open Repository and Archive

Coupling On-Line and Off-Line Profile Information to Improve Program Performance

Author
Publication venue
Publication date
Field of study

In this paper, we describe a novel execution environment for Java programs that substantially improves execution performance by incorporating both on-line and off-line profile information to guide dynamic optimization. By using both types of profile collection techniques, we are able to exploit the strengths of each constituent approach: profile accuracy and low overhead. Such coupling also reduces the negative impact of these approaches when each is used in isolation. On-line profiling introduces overhead for dynamic instrumentation, measurement, and decision making. Off-line profile information can be inaccurate when program inputs for execution and optimization differ from those used for profiling. To combat these drawbacks and to achieve the benefits from both online and off-line profiling, we developed a dynamic compilation system (based on JikesRVM) that makes use of both. As a result, we are able improve Java program performance by 9 % on average, for the programs studied. 1

CiteSeerX

ENERGY-AWARE OPTIMIZATION FOR EMBEDDED SYSTEMS WITH CHIP MULTIPROCESSOR AND PHASE-CHANGE MEMORY

Author: Li Jiayin
Publication venue: UKnowledge
Publication date: 01/01/2012
Field of study

Over the last two decades, functions of the embedded systems have evolved from simple real-time control and monitoring to more complicated services. Embedded systems equipped with powerful chips can provide the performance that computationally demanding information processing applications need. However, due to the power issue, the easy way to gain increasing performance by scaling up chip frequencies is no longer feasible. Recently, low-power architecture designs have been the main trend in embedded system designs. In this dissertation, we present our approaches to attack the energy-related issues in embedded system designs, such as thermal issues in the 3D chip multiprocessor (CMP), the endurance issue in the phase-change memory(PCM), the battery issue in the embedded system designs, the impact of inaccurate information in embedded system, and the cloud computing to move the workload to remote cloud computing facilities. We propose a real-time constrained task scheduling method to reduce peak temperature on a 3D CMP, including an online 3D CMP temperature prediction model and a set of algorithm for scheduling tasks to different cores in order to minimize the peak temperature on chip. To address the challenging issues in applying PCM in embedded systems, we propose a PCM main memory optimization mechanism through the utilization of the scratch pad memory (SPM). Furthermore, we propose an MLC/SLC configuration optimization algorithm to enhance the efficiency of the hybrid DRAM + PCM memory. We also propose an energy-aware task scheduling algorithm for parallel computing in mobile systems powered by batteries. When scheduling tasks in embedded systems, we make the scheduling decisions based on information, such as estimated execution time of tasks. Therefore, we design an evaluation method for impacts of inaccurate information on the resource allocation in embedded systems. Finally, in order to move workload from embedded systems to remote cloud computing facility, we present a resource optimization mechanism in heterogeneous federated multi-cloud systems. And we also propose two online dynamic algorithms for resource allocation and task scheduling. We consider the resource contention in the task scheduling

University of Kentucky

Exploiting managed language semantics to optimize for hardware heterogeneity

Author: Akram Shoaib
Publication venue
Publication date: 01/01/2019
Field of study

Ghent University Academic Bibliography