3,127 research outputs found

    Emulating and evaluating hybrid memory for managed languages on NUMA hardware

    Get PDF
    Non-volatile memory (NVM) has the potential to become a mainstream memory technology and challenge DRAM. Researchers evaluating the speed, endurance, and abstractions of hybrid memories with DRAM and NVM typically use simulation, making it easy to evaluate the impact of different hardware technologies and parameters. Simulation is, however, extremely slow, limiting the applications and datasets in the evaluation. Simulation also precludes critical workloads, especially those written in managed languages such as Java and C#. Good methodology embraces a variety of techniques for evaluating new ideas, expanding the experimental scope, and uncovering new insights. This paper introduces a platform to emulate hybrid memory for managed languages using commodity NUMA servers. Emulation complements simulation but offers richer software experimentation. We use a thread-local socket to emulate DRAM and a remote socket to emulate NVM. We use standard C library routines to allocate heap memory on the DRAM and NVM sockets for use with explicit memory management or garbage collection. We evaluate the emulator using various configurations of write-rationing garbage collectors that improve NVM lifetimes by limiting writes to NVM, using 15 applications and various datasets and workload configurations. We show emulation and simulation confirm each other's trends in terms of writes to NVM for different software configurations, increasing our confidence in predicting future system effects. Emulation brings novel insights, such as the non-linear effects of multi-programmed workloads on NVM writes, and that Java applications write significantly more than their C++ equivalents. We make our software infrastructure publicly available to advance the evaluation of novel memory management schemes on hybrid memories

    Prioritized Garbage Collection: Explicit GC Support for Software Caches

    Full text link
    Programmers routinely trade space for time to increase performance, often in the form of caching or memoization. In managed languages like Java or JavaScript, however, this space-time tradeoff is complex. Using more space translates into higher garbage collection costs, especially at the limit of available memory. Existing runtime systems provide limited support for space-sensitive algorithms, forcing programmers into difficult and often brittle choices about provisioning. This paper presents prioritized garbage collection, a cooperative programming language and runtime solution to this problem. Prioritized GC provides an interface similar to soft references, called priority references, which identify objects that the collector can reclaim eagerly if necessary. The key difference is an API for defining the policy that governs when priority references are cleared and in what order. Application code specifies a priority value for each reference and a target memory bound. The collector reclaims references, lowest priority first, until the total memory footprint of the cache fits within the bound. We use this API to implement a space-aware least-recently-used (LRU) cache, called a Sache, that is a drop-in replacement for existing caches, such as Google's Guava library. The garbage collector automatically grows and shrinks the Sache in response to available memory and workload with minimal provisioning information from the programmer. Using a Sache, it is almost impossible for an application to experience a memory leak, memory pressure, or an out-of-memory crash caused by software caching.Comment: to appear in OOPSLA 201

    Micro Virtual Machines: A Solid Foundation for Managed Language Implementation

    Get PDF
    Today new programming languages proliferate, but many of them suffer from poor performance and inscrutable semantics. We assert that the root of many of the performance and semantic problems of today's languages is that language implementation is extremely difficult. This thesis addresses the fundamental challenges of efficiently developing high-level managed languages. Modern high-level languages provide abstractions over execution, memory management and concurrency. It requires enormous intellectual capability and engineering effort to properly manage these concerns. Lacking such resources, developers usually choose naive implementation approaches in the early stages of language design, a strategy which too often has long-term consequences, hindering the future development of the language. Existing language development platforms have failed to provide the right level of abstraction, and forced implementers to reinvent low-level mechanisms in order to obtain performance. My thesis is that the introduction of micro virtual machines will allow the development of higher-quality, high-performance managed languages. The first contribution of this thesis is the design of Mu, with the specification of Mu as the main outcome. Mu is the first micro virtual machine, a robust, performant, and light-weight abstraction over just three concerns: execution, concurrency and garbage collection. Such a foundation attacks three of the most fundamental and challenging issues that face existing language designs and implementations, leaving the language implementers free to focus on the higher levels of their language design. The second contribution is an in-depth analysis of on-stack replacement and its efficient implementation. This low-level mechanism underpins run-time feedback-directed optimisation, which is key to the efficient implementation of dynamic languages. The third contribution is demonstrating the viability of Mu through RPython, a real-world non-trivial language implementation. We also did some preliminary research of GHC as a Mu client. We have created the Mu specification and its reference implementation, both of which are open-source. We show that that Mu's on-stack replacement API can gracefully support dynamic languages such as JavaScript, and it is implementable on concrete hardware. Our RPython client has been able to translate and execute non-trivial RPython programs, and can run the RPySOM interpreter and the core of the PyPy interpreter. With micro virtual machines providing a low-level substrate, language developers now have the option to build their next language on a micro virtual machine. We believe that the quality of programming languages will be improved as a result

    Crystal gazer : profile-driven write-rationing garbage collection for hybrid memories

    Get PDF
    Non-volatile memories (NVM) offer greater capacity than DRAM but suffer from high latency and low write endurance. Hybrid memories combine DRAM and NVM to form scalable memory systems with the promise of high capacity, low energy consumption, and high endurance. Automatically managing hybrid NVM-DRAM memories to achieve their promise without changing user applications or their programming models remains an open question. This paper uses garbage collection in managed languages to exploit NVM capacity while preventing NVM wear out in hybrid memories with no changes to the programming model. We introduce profile-driven write-rationing garbage collection. Allocation sites that produce frequently written objects are predicted based on previous program executions. Objects are initially allocated in a DRAM nursery space. The collector copies surviving nursery objects from highly written sites to a mature DRAM space and read-mostly objects to a mature NVM space.Write-intensity prediction for 15 Java benchmarks accurately places objects in the correct space, eliminating expensive object monitoring from prior write-rationing garbage collectors. Furthermore, our technique exposes a Pareto tradeoff between DRAM usage and NVM lifetime, unlike prior work. Experimental results on NUMA hardware that emulates hybrid NVM-DRAM memory demonstrates that profile-driven write-rationing garbage collection reduces the number of writes to NVM compared to prior work to extend its lifetime, maximizes the use of NVM for its capacity, and achieves good performance

    Exploiting the Weak Generational Hypothesis for Write Reduction and Object Recycling

    Get PDF
    Programming languages with automatic memory management are continuing to grow in popularity due to ease of programming. However, these languages tend to allocate objects excessively, leading to inefficient use of memory and large garbage collection and allocation overheads. The weak generational hypothesis notes that objects tend to die young in languages with automatic dynamic memory management. Much work has been done to optimize allocation and garbage collection algorithms based on this observation. Previous work has largely focused on developing efficient software algorithms for allocation and collection. However, much less work has studied architectural solutions. In this work, we propose and evaluate architectural support for assisting allocation and garbage collection. We first study the effects of languages with automatic memory management on the memory system. As objects often die young, it is likely many objects die while in the processor\u27s caches. Writes of dead data back to main memory are unnecessary, as the data will never be used again. To study this, we develop and present architecture support to identify dead objects while they remain resident in cache and eliminate any unnecessary writes. We show that many writes out of the caches are unnecessary, and can be avoided using our hardware additions. Next, we study the effects of using dead data in cache to assist with allocation and garbage collection. Logic is developed and presented to allow for reuse of cache space found dead to satisfy future allocation requests. We show that dead cache space can be recycled at a high rate, reducing pressure on the allocator and reducing cache miss rates. However, a full implementation of our initial approach is shown to be unscalable. We propose and study limitations to our approach, trading object coverage for scalability. Third, we present a new approach for identifying objects that die young based on a limitation of our previous approach. We show this approach has much lower storage and logic requirements and is scalable, while only slightly decreasing overall object coverage
    corecore