Search CORE

552 research outputs found

A Compiler and Runtime Infrastructure for Automatic Program Distribution

Author: Chu Matt
Diaconescu Roxana E.
Mouri Zachary
Wang Lei
Publication venue
Publication date: 01/04/2005
Field of study

This paper presents the design and the implementation of a compiler and runtime infrastructure for automatic program distribution. We are building a research infrastructure that enables experimentation with various program partitioning and mapping strategies and the study of automatic distribution's effect on resource consumption (e.g., CPU, memory, communication). Since many optimization techniques are faced with conflicting optimization targets (e.g., memory and communication), we believe that it is important to be able to study their interaction. We present a set of techniques that enable flexible resource modeling and program distribution. These are: dependence analysis, weighted graph partitioning, code and communication generation, and profiling. We have developed these ideas in the context of the Java language. We present in detail the design and implementation of each of the techniques as part of our compiler and runtime infrastructure. Then, we evaluate our design and present preliminary experimental data for each component, as well as for the entire system

Caltech Authors

UML Assisted Visual Debugging for Distributed Systems

Author: Musial Benjamin R.
Publication venue: AFIT Scholar
Publication date: 01/03/2003
Field of study

The DOD is developing a Joint Battlespace Infosphere, linking a large number of data sources and user applications. To assist in this process, debugging and analysis tools are required. Software debugging is an extremely difficult cognitive process requiring comprehension of the overall application behavior, along with detailed understanding of specific application components. This is further complicated with distributed systems by the addition of other programs, their large size and synchronization issues. Typical debuggers provide inadequate support for this process, focusing primarily on the details accessible through source code. To overcome this deficiency, this research links the dynamic program execution state to a Unified Modeling Language (UML) class diagram that is reverse-engineered from data accessed within the Java Platform Debug Architecture. This research uses focus + context, graph layout, and color encoding techniques to enhance the standard UML diagram. These techniques organize and present objects and events in a manner that facilitates analysis of system behavior. High-level abstractions commonly used in system design support debugging while maintaining access to low-level details with an interactive display. The user is also able to monitor the control flow through highlighting of the relevant object and method in the display

Worst-case execution time analysis-driven object cache design

Author: Huber Benedikt
Puffitsch Wolfgang
Schoeberl Martin
Publication venue: 'Wiley'
Publication date: 01/01/2012
Field of study

Adaptive sampling-based profiling techniques for optimizing the distributed JVM runtime

Author: Lam KT
Luo Y
Wang CL
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Extending the standard Java virtual machine (JVM) for cluster-awareness is a transparent approach to scaling out multithreaded Java applications. While this clustering solution is gaining momentum in recent years, efficient runtime support for fine-grained object sharing over the distributed JVM remains a challenge. The system efficiency is strongly connected to the global object sharing profile that determines the overall communication cost. Once the sharing or correlation between threads is known, access locality can be optimized by collocating highly correlated threads via dynamic thread migrations. Although correlation tracking techniques have been studied in some page-based sof Tware DSM systems, they would entail prohibitively high overheads and low accuracy when ported to fine-grained object-based systems. In this paper, we propose a lightweight sampling-based profiling technique for tracking inter-thread sharing. To preserve locality across migrations, we also propose a stack sampling mechanism for profiling the set of objects which are tightly coupled with a migrant thread. Sampling rates in both techniques can vary adaptively to strike a balance between preciseness and overhead. Such adaptive techniques are particularly useful for applications whose sharing patterns could change dynamically. The profiling results can be exploited for effective thread-to-core placement and dynamic load balancing in a distributed object sharing environment. We present the design and preliminary performance result of our distributed JVM with the profiling implemented. Experimental results show that the profiling is able to obtain over 95% accurate global sharing profiles at a cost of only a few percents of execution time increase for fine- to medium- grained applications. © 2010 IEEE.published_or_final_versionThe 24th IEEE International Symposium on Parallel & Distributed Processing (IPDPS 2010), Atlanta, GA., 19-23 April 2010. In Proceedings of the 24th IPDPS, 2010, p. 1-1

HKU Scholars Hub

Techniques for analyzing and optimizing the energy consumption in (mobile) applications

Author: Roosens Simon
Publication venue
Publication date: 23/06/2020
Field of study

Making non-volatile memory programmable

Author: Shull Thomas Edward
Publication venue
Publication date: 01/08/2020
Field of study

Byte-addressable, non-volatile memory (NVM) is emerging as a revolutionary memory technology that provides persistence, near-DRAM performance, and scalable capacity. By using NVM, applications can directly create and manipulate durable data in place without the need for serialization out to SSDs. Ideally, through NVM, persistent applications will be able to maintain crash-consistency at a minimal cost. However, before this is possible, improvements must be made at both the hardware and software level to support persistent applications. Currently, software support for NVM places too high of a burden on the developer, introducing many opportunities for mistakes while also being too rigid for compiler optimizations. Likewise, at the hardware level, too little information is passed to the processor about the instruction-level ordering requirements of persistent applications; this forces the hardware to require the use of coarse fences, which significantly slow down execution. To help realize the promise of NVM, this thesis proposes both new software and hardware support that make NVM programmable. From the software side, this thesis proposes a new NVM programming model which relieves the programmer from performing much of the accounting work in persistent applications, instead relying on the runtime to perform error-prone tasks. Specifically, within the proposed model, the user only needs to provide minimal markings to identify the persistent data set and to ensure data is updated in a crash-consistent manner. Given this new NVM programming model, this thesis next presents an implementation of the model in Java. I call my implementation AutoPersist and build my support into the Maxine research Java Virtual Machine (JVM). In this thesis I describe how the JVM can be changed to support the proposed NVM programming model, including adding new Java libraries, adding new JVM runtime features, and augmenting the behavior of existing Java bytecodes. In addition to being easy-to-use, another advantage of the proposed model is that it is amenable to compiler optimizations. In this thesis I highlight two profile-guided optimizations: eagerly allocating objects directly into NVM and speculatively pruning control flow to only include expected-to-be taken paths. I also describe how to apply these optimizations to AutoPersist and show they have a substantial performance impact. While designing AutoPersist, I often observed that dependency information known by the compiler cannot be passed down to the underlying hardware; instead, the compiler must insert coarse-grain fences to enforce needed dependencies. This is because current instruction set architectures (ISA) cannot describe arbitrary instruction-level execution ordering constraints. To fix this limitation, I introduce the Execution Dependency Extension (EDE), and describe how EDE can be added to an existing ISA as well as be implemented in current processor pipelines. Overall, emerging NVM technologies can deliver programmer-friendly high performance. However, for this to happen, both software and hardware improvements are necessary. This thesis takes steps to address current the software and hardware gaps: I propose new software support to assist in the development of persistent applications and also introduce new instructions which allow for arbitrary instruction-level dependencies to be conveyed and enforced by the underlying hardware. With these improvements, hopefully the dream of programmable high-performance NVM is one step closer to being realized

Java Dust: How Small Can Embedded Java Be?

Author: Caska James
Schoeberl Martin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

Java is slowly being accepted as a language and platform for embedded devices. However, the memory requirements of the Java library and runtime are still troublesome. A Java system is considered small when it requires less than 1 MB, and within the embedded domain small microcontollers with a few KB on-chip Flash memory and even less on-chip RAM are very common. For such small devices Java is a clearly challenging. In this paper we present the combination of the Java compiler Muvium for microcontrollers with the tiny soft-core Leros for an FPGA. To the best of our knowledge, the presented embedded Java system is the smallest Java system available. The Leros processor consumes less than 5 % of the logic cells of the smallest FPGA from Altera and the Muvium compiler produces a JVM, including the Java application, that can execute in a few KB ROM and less than 1 KB RAM. The Leros processor is available in open-source and the Leros port of Muvium is freely available

CiteSeerX

Righting Web Development

Author: Vilk John
Publication venue: ScholarWorks@UMass Amherst
Publication date: 25/10/2018
Field of study

The web browser is the most important application runtime today, encompassing all types of applications on practically every Internet-connected device. Browsers power complete office suites, media players, games, and augmented and virtual reality experiences, and they integrate with cameras, microphones, GPSes, and other sensors available on computing devices. Many apparently native mobile and desktop applications are secretly hybrid apps that contain a mix of native and browser code. History has shown that when new devices, sensors, and experiences appear on the market, the browser will evolve to support them. Despite the browser\u27s importance, developing web applications is exceedingly difficult. Web browsers organically evolved from a document viewer into a ubiquitous program runtime. The browser\u27s scripting language for web designers, JavaScript, has grown into the only universally supported programming language in the browser. Unfortunately, JavaScript is notoriously difficult to write and debug. The browser\u27s high-level and event-driven I/O interfaces make it easy to add simple interactions to webpages, but these same interfaces lead to nondeterministic bugs and performance issues in larger applications. These bugs are challenging for developers to reason about and fix. This dissertation revisits web development and provides developers with a complete set of development tools with full support for the browser environment. McFly is the first time-traveling debugger for the browser, and lets developers debug web applications and their visual state during time-travel; components of this work shipped in Microsoft\u27s ChakraCore JavaScript engine. BLeak is the first system for automatically debugging memory leaks in web applications, and provides developers with a ranked list of memory leaks along with the source code responsible for them. BCause constructs a causal graph of a web application\u27s events, which helps developers understand their code\u27s behavior. Doppio lets developers run code written in conventional languages in the browser, and Browsix brings Unix into the browser to enable unmodified programs expecting a Unix-like environment to run directly in the browser. Together, these five systems form a solid foundation for web development