Search CORE

34 research outputs found

Recommended from our members

Abstract Models of Memory Management

Author: Felleisen Matthias
Harper Robert
Morrisett John Gregory
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/09/2009
Field of study

Most specifications of garbage collectors concentrate on the low-level algorithmic details of how to find and preserve accessible objects. Often, they focus on bit-level manipulations such as "scanning stack frames," "marking objects," "tagging data," etc. While these details are important in some contexts, they often obscure the more fundamental aspects of memory management: what objects are garbage and why? We develop a series of calculi that are just low-level enough that we can express allocation and garbage collection, yet are sufficiently abstract that we many formally prove the correctness of various memory management strategies. By making the heap of a program syntactically apparent, we can specify memory actions as rewriting rules that allocate values on the heap and automatically dereference pointers to such objects when needed. this formulation permits the specification of garbage collection as a relation that removes portions of the heap without affecting the outcome of the evaluation. Our high-level approach allows us to specify in a compact manner a wide variety of memory management techniques, including standard trace-based garbage collection (i.e., the family of copying and mark/sweep collection algorithms), generational collection, and type-based, tag-free collection. Furthermore, since the definition of garbage is based on the semantics of the underlying language instead of the conservative approximation of inaccessibility, we are able to specify and prove the idea that type inference can be used to collect some objects that are accessible but never used.Engineering and Applied Science

Harvard University - DASH

Divided we stand: Parallel distributed stack memory management

Author: Hermenegildo Manuel V.
Kish Shen
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/1994
Field of study

We present an overview of the stack-based memory management techniques that we used in our non-deterministic and-parallel Prolog systems: &-Prolog and DASWAM. We believe that the problems associated with non-deterministic and-parallel systems are more general than those encountered in or-parallel and deterministic and-parallel systems, which can be seen as subsets of this more general case. We develop on the previously proposed "marker scheme", lifting some of the restrictions associated with the selection of goals while keeping (virtual) memory consumption down. We also review some of the other problems associated with the stack-based management scheme, such as handling of forward and backward execution, cut, and roll-backs

Archivo Digital UPM

Garbage collection auto-tuning for Java MapReduce on Multi-Cores

Author: Allen E.
Chu C.
Dean J.
DeWitt D.
Gavin Brown
George Kovoor
Jeremy Singer
Kovoor G.
M
Mikel Luján
Printezis T.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

MapReduce has been widely accepted as a simple programming pattern that can form the basis for efficient, large-scale, distributed data processing. The success of the MapReduce pattern has led to a variety of implementations for different computational scenarios. In this paper we present MRJ, a MapReduce Java framework for multi-core architectures. We evaluate its scalability on a four-core, hyperthreaded Intel Core i7 processor, using a set of standard MapReduce benchmarks. We investigate the significant impact that Java runtime garbage collection has on the performance and scalability of MRJ. We propose the use of memory management auto-tuning techniques based on machine learning. With our auto-tuning approach, we are able to achieve MRJ performance within 10% of optimal on 75% of our benchmark tests

Crossref

Enlighten

QTM: computational package using MPI protocol for quantum trajectories method

Author: Sawerwain Marek
Wiśniewska Joanna
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

The Quantum Trajectories Method (QTM) is one of {the} frequently used methods for studying open quantum systems. { The main idea of this method is {the} evolution of wave functions which {describe the system (as functions of time). Then,} so-called quantum jumps are applied at {a} randomly selected point in time. {The} obtained system state is called as a trajectory. After averaging many single trajectories{,} we obtain the approximation of the behavior of {a} quantum system.} {This fact also allows} us to use parallel computation methods. In the article{,} we discuss the QTM package which is supported by the MPI technology. Using MPI allowed {utilizing} the parallel computing for calculating the trajectories and averaging them -- as the effect of these actions{,} the time {taken by} calculations is shorter. In spite of using the C++ programming language, the presented solution is easy to utilize and does not need any advanced programming techniques. At the same time{,} it offers a higher performance than other packages realizing the QTM. It is especially important in the case of harder computational tasks{,} and the use of MPI allows {improving the} performance of particular problems which can be solved in the field of open quantum systems.Comment: 28 pages, 9 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

The economics of garbage collection

Author: Brown G.
Jones R.
Lujan M.
Singer J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

This paper argues that economic theory can improve our understanding of memory management. We introduce the allocation curve, as an analogue of the demand curve from microeconomics. An allocation curve for a program characterises how the amount of garbage collection activity required during its execution varies in relation to the heap size associated with that program. The standard treatment of microeconomic demand curves (shifts and elasticity) can be applied directly and intuitively to our new allocation curves. As an application of this new theory, we show how allocation elasticity can be used to control the heap growth rate for variable sized heaps in Jikes RVM

Crossref

Kent Academic Repository

Enlighten

Lock-Free and Practical Deques using Single-Word Compare-And-Swap

Author: A. Silberschatz
M. Greenwald
M. Herlihy
M. Herlihy
M.M. Michael
Publication venue
Publication date: 01/01/2004
Field of study

We present an efficient and practical lock-free implementation of a concurrent deque that is disjoint-parallel accessible and uses atomic primitives which are available in modern computer systems. Previously known lock-free algorithms of deques are either based on non-available atomic synchronization primitives, only implement a subset of the functionality, or are not designed for disjoint accesses. Our algorithm is based on a doubly linked list, and only requires single-word compare-and-swap atomic primitives, even for dynamic memory sizes. We have performed an empirical study using full implementations of the most efficient algorithms of lock-free deques known. For systems with low concurrency, the algorithm by Michael shows the best performance. However, as our algorithm is designed for disjoint accesses, it performs significantly better on systems with high concurrency and non-uniform memory architecture

arXiv.org e-Print Archive

Crossref

Chalmers Research

Chalmers Publication Library

PlinyCompute: A Platform for High-Performance, Distributed, Data-Intensive Tool Development

Author: Barnett R. Matthew
Jermaine Chris
Lorido-Botran Tania
Luo Shangyu
Monroy Carlos
Sikdar Sourav
Teymourian Kia
Yuan Binhang
Zou Jia
Publication venue
Publication date: 01/01/2017
Field of study

This paper describes PlinyCompute, a system for development of high-performance, data-intensive, distributed computing tools and libraries. In the large, PlinyCompute presents the programmer with a very high-level, declarative interface, relying on automatic, relational-database style optimization to figure out how to stage distributed computations. However, in the small, PlinyCompute presents the capable systems programmer with a persistent object data model and API (the "PC object model") and associated memory management system that has been designed from the ground-up for high performance, distributed, data-intensive computing. This contrasts with most other Big Data systems, which are constructed on top of the Java Virtual Machine (JVM), and hence must at least partially cede performance-critical concerns such as memory management (including layout and de/allocation) and virtual method/function dispatch to the JVM. This hybrid approach---declarative in the large, trusting the programmer's ability to utilize PC object model efficiently in the small---results in a system that is ideal for the development of reusable, data-intensive tools and libraries. Through extensive benchmarking, we show that implementing complex objects manipulation and non-trivial, library-style computations on top of PlinyCompute can result in a speedup of 2x to more than 50x or more compared to equivalent implementations on Spark.Comment: 48 pages, including references and Appendi

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

LEAP Scratchpads: Automatic Memory and Cache Management for Reconfigurable Logic [Extended Version]

Author: Adler Michael
Emer Joel
Fleming Kermin E.
Parashar Angshuman
Pellauer Michael
Publication venue
Publication date: 23/11/2010
Field of study

CORRECTION: The authors for entry [4] in the references should have been "E. S. Chung, J. C. Hoe, and K. Mai".Developers accelerating applications on FPGAs or other reconfigurable logic have nothing but raw memory devices in their standard toolkits. Each project typically includes tedious development of single-use memory management. Software developers expect a programming environment to include automatic memory management. Virtual memory provides the illusion of very large arrays and processor caches reduce access latency without explicit programmer instructions. LEAP scratchpads for reconfigurable logic dynamically allocate and manage multiple, independent, memory arrays in a large backing store. Scratchpad accesses are cached automatically in multiple levels, ranging from shared on-board, RAM-based, set-associative caches to private caches stored in FPGA RAM blocks. In the LEAP framework, scratchpads share the same interface as on-die RAM blocks and are plug-in replacements. Additional libraries support heap management within a storage set. Like software developers, accelerator authors using scratchpads may focus more on core algorithms and less on memory management. Two uses of FPGA scratchpads are analyzed: buffer management in an H.264 decoder and memory management within a processor microarchitecture timing model

DSpace@MIT

DSM64: A Distributed Shared Memory System in User-Space

Author: Holsapple Stephen Alan
Publication venue: DigitalCommons@CalPoly
Publication date: 01/05/2012
Field of study

This paper presents DSM64: a lazy release consistent software distributed shared memory (SDSM) system built entirely in user-space. The DSM64 system is capable of executing threaded applications implemented with pthreads on a cluster of networked machines without any modifications to the target application. The DSM64 system features a centralized memory manager [1] built atop Hoard [2, 3]: a fast, scalable, and memory-efficient allocator for shared-memory multiprocessors. In my presentation, I present a SDSM system written in C++ for Linux operating systems. I discuss a straight-forward approach to implement SDSM systems in a Linux environment using system-provided tools and concepts avail- able entirely in user-space. I show that the SDSM system presented in this paper is capable of resolving page faults over a local area network in as little as 2 milliseconds. In my analysis, I present the following. I compare the performance characteristics of a matrix multiplication benchmark using various memory coherency models. I demonstrate that matrix multiplication benchmark using a LRC model performs orders of magnitude quicker than the same application using a stricter coherency model. I show the effect of coherency model on memory access patterns and memory contention. I compare the effects of different locking strategies on execution speed and memory access patterns. Lastly, I provide a comparison of the DSM64 system to a non-networked version using a system-provided allocator

DigitalCommons@CalPoly