2 research outputs found

    Making non-volatile memory programmable

    Get PDF
    Byte-addressable, non-volatile memory (NVM) is emerging as a revolutionary memory technology that provides persistence, near-DRAM performance, and scalable capacity. By using NVM, applications can directly create and manipulate durable data in place without the need for serialization out to SSDs. Ideally, through NVM, persistent applications will be able to maintain crash-consistency at a minimal cost. However, before this is possible, improvements must be made at both the hardware and software level to support persistent applications. Currently, software support for NVM places too high of a burden on the developer, introducing many opportunities for mistakes while also being too rigid for compiler optimizations. Likewise, at the hardware level, too little information is passed to the processor about the instruction-level ordering requirements of persistent applications; this forces the hardware to require the use of coarse fences, which significantly slow down execution. To help realize the promise of NVM, this thesis proposes both new software and hardware support that make NVM programmable. From the software side, this thesis proposes a new NVM programming model which relieves the programmer from performing much of the accounting work in persistent applications, instead relying on the runtime to perform error-prone tasks. Specifically, within the proposed model, the user only needs to provide minimal markings to identify the persistent data set and to ensure data is updated in a crash-consistent manner. Given this new NVM programming model, this thesis next presents an implementation of the model in Java. I call my implementation AutoPersist and build my support into the Maxine research Java Virtual Machine (JVM). In this thesis I describe how the JVM can be changed to support the proposed NVM programming model, including adding new Java libraries, adding new JVM runtime features, and augmenting the behavior of existing Java bytecodes. In addition to being easy-to-use, another advantage of the proposed model is that it is amenable to compiler optimizations. In this thesis I highlight two profile-guided optimizations: eagerly allocating objects directly into NVM and speculatively pruning control flow to only include expected-to-be taken paths. I also describe how to apply these optimizations to AutoPersist and show they have a substantial performance impact. While designing AutoPersist, I often observed that dependency information known by the compiler cannot be passed down to the underlying hardware; instead, the compiler must insert coarse-grain fences to enforce needed dependencies. This is because current instruction set architectures (ISA) cannot describe arbitrary instruction-level execution ordering constraints. To fix this limitation, I introduce the Execution Dependency Extension (EDE), and describe how EDE can be added to an existing ISA as well as be implemented in current processor pipelines. Overall, emerging NVM technologies can deliver programmer-friendly high performance. However, for this to happen, both software and hardware improvements are necessary. This thesis takes steps to address current the software and hardware gaps: I propose new software support to assist in the development of persistent applications and also introduce new instructions which allow for arbitrary instruction-level dependencies to be conveyed and enforced by the underlying hardware. With these improvements, hopefully the dream of programmable high-performance NVM is one step closer to being realized

    Automatic object colocation based on read barriers

    No full text
    Abstract. Object colocation is an optimization that reduces memory access costs by grouping together heap objects so that their order in memory matches their access order in the program. We implemented this optimization for Sun Microsystems ’ Java HotSpot TM VM. The garbage collector, which moves objects during collection, assigns consecutive addresses to connected objects and handles them as atomic units. We use read barriers inserted by the just-in-time compiler to detect the most frequently accessed fields per class. These “hot fields ” are added to so-called hot-field tables, which are then used by the garbage collector for colocation decisions. Read barriers that are no longer needed are removed in order to reduce the overhead. Our analysis is performed automatically at run time and requires no actions on the side of the programmer. We measured the impact of object colocation on the young and the old generation of the garbage collector, as well as the difference between dynamic colocation using read barriers and a static colocation strategy where colocation decisions are done at compile time. Our measurements show that object colocation works best for the young generation using a read-barrier-based approach.
    corecore