34,005 research outputs found
A Cyclic Distributed Garbage Collector for Network Objects
This paper presents an algorithm for distributed garbage collection and outlines its implementation within the Network Objects system. The algorithm is based on a reference listing scheme, which is augmented by partial tracing in order to collect distributed garbage cycles. Processes may be dynamically organised into groups, according to appropriate heuristics, to reclaim distributed garbage cycles. The algorithm places no overhead on local collectors and suspends local mutators only briefly. Partial tracing of the distributed graph involves only objects thought to be part of a garbage cycle: no collaboration with other processes is required. The algorithm offers considerable flexibility, allowing expediency and fault-tolerance to be traded against completeness
Lock-free Concurrent Data Structures
Concurrent data structures are the data sharing side of parallel programming.
Data structures give the means to the program to store data, but also provide
operations to the program to access and manipulate these data. These operations
are implemented through algorithms that have to be efficient. In the sequential
setting, data structures are crucially important for the performance of the
respective computation. In the parallel programming setting, their importance
becomes more crucial because of the increased use of data and resource sharing
for utilizing parallelism.
The first and main goal of this chapter is to provide a sufficient background
and intuition to help the interested reader to navigate in the complex research
area of lock-free data structures. The second goal is to offer the programmer
familiarity to the subject that will allow her to use truly concurrent methods.Comment: To appear in "Programming Multi-core and Many-core Computing
Systems", eds. S. Pllana and F. Xhafa, Wiley Series on Parallel and
Distributed Computin
Formal Derivation of Concurrent Garbage Collectors
Concurrent garbage collectors are notoriously difficult to implement
correctly. Previous approaches to the issue of producing correct collectors
have mainly been based on posit-and-prove verification or on the application of
domain-specific templates and transformations. We show how to derive the upper
reaches of a family of concurrent garbage collectors by refinement from a
formal specification, emphasizing the application of domain-independent design
theories and transformations. A key contribution is an extension to the
classical lattice-theoretic fixpoint theorems to account for the dynamics of
concurrent mutation and collection.Comment: 38 pages, 21 figures. The short version of this paper appeared in the
Proceedings of MPC 201
Emulating and evaluating hybrid memory for managed languages on NUMA hardware
Non-volatile memory (NVM) has the potential to become a mainstream memory technology and challenge DRAM. Researchers evaluating the speed, endurance, and abstractions of hybrid memories with DRAM and NVM typically use simulation, making it easy to evaluate the impact of different hardware technologies and parameters. Simulation is, however, extremely slow, limiting the applications and datasets in the evaluation. Simulation also precludes critical workloads, especially those written in managed languages such as Java and C#. Good methodology embraces a variety of techniques for evaluating new ideas, expanding the experimental scope, and uncovering new insights.
This paper introduces a platform to emulate hybrid memory for managed languages using commodity NUMA servers. Emulation complements simulation but offers richer software experimentation. We use a thread-local socket to emulate DRAM and a remote socket to emulate NVM. We use standard C library routines to allocate heap memory on the DRAM and NVM sockets for use with explicit memory management or garbage collection. We evaluate the emulator using various configurations of write-rationing garbage collectors that improve NVM lifetimes by limiting writes to NVM, using 15 applications and various datasets and workload configurations. We show emulation and simulation confirm each other's trends in terms of writes to NVM for different software configurations, increasing our confidence in predicting future system effects. Emulation brings novel insights, such as the non-linear effects of multi-programmed workloads on NVM writes, and that Java applications write significantly more than their C++ equivalents. We make our software infrastructure publicly available to advance the evaluation of novel memory management schemes on hybrid memories
Garbage collection auto-tuning for Java MapReduce on Multi-Cores
MapReduce has been widely accepted as a simple programming pattern that can form the basis for efficient, large-scale, distributed data processing. The success of the MapReduce pattern has led to a variety of implementations for different computational scenarios. In this paper we present MRJ, a MapReduce Java framework for multi-core architectures. We evaluate its scalability on a four-core, hyperthreaded Intel Core i7 processor, using a set of standard MapReduce benchmarks. We investigate the significant impact that Java runtime garbage collection has on the performance and scalability of MRJ. We propose the use of memory management auto-tuning techniques based on machine learning. With our auto-tuning approach, we are able to achieve MRJ performance within 10% of optimal on 75% of our benchmark tests
Sublogarithmic uniform Boolean proof nets
Using a proofs-as-programs correspondence, Terui was able to compare two
models of parallel computation: Boolean circuits and proof nets for
multiplicative linear logic. Mogbil et. al. gave a logspace translation
allowing us to compare their computational power as uniform complexity classes.
This paper presents a novel translation in AC0 and focuses on a simpler
restricted notion of uniform Boolean proof nets. We can then encode
constant-depth circuits and compare complexity classes below logspace, which
were out of reach with the previous translations.Comment: In Proceedings DICE 2011, arXiv:1201.034
- …