138,311 research outputs found
Distributed computation of persistent homology
Persistent homology is a popular and powerful tool for capturing topological
features of data. Advances in algorithms for computing persistent homology have
reduced the computation time drastically -- as long as the algorithm does not
exhaust the available memory. Following up on a recently presented parallel
method for persistence computation on shared memory systems, we demonstrate
that a simple adaption of the standard reduction algorithm leads to a variant
for distributed systems. Our algorithmic design ensures that the data is
distributed over the nodes without redundancy; this permits the computation of
much larger instances than on a single machine. Moreover, we observe that the
parallelism at least compensates for the overhead caused by communication
between nodes, and often even speeds up the computation compared to sequential
and even parallel shared memory algorithms. In our experiments, we were able to
compute the persistent homology of filtrations with more than a billion (10^9)
elements within seconds on a cluster with 32 nodes using less than 10GB of
memory per node
Supporting persistent C++ objects in a distributed storage system
technical reportWe have designed and implemented a C++ object layer for Khazana, a distributed persistent storage system that exports a flat shared address space as its basic abstraction. The C++ layer described herein lets programmers use familiar C++ idioms to allocate, manipulate, and deallocate persistent shared data structures. It handles the tedious details involved in accessing this shared data, replicating it, maintaining consistency, converting data representations between persistent and in-memory representations, associating type information including methods with objects, etc. To support the C++ object layer on top of Khazana's flat storage abstraction, we have developed a language-specific preprocessor that generates support code to manage the user-specified persistent C++ structures. We describe the design of the C++ object layer and the compiler and runtime mechanisms needed to support it
Garbage Collection of Persistent Objects in Distributed Shared Memory
International audienceThis paper describes a garbage collection algorithm for distributed persistent objects in a loosely coupled network of workstations. Objects are accessed via a weakly consistent shared distributed virtual memory with recoverable properties. We address the specific problem of garbage collecting a large amount of distributed persistent objects, cached on several nodes for efficient sharing. For clustering purposes, objects are allocated within segments, and segments are logically grouped into bunches . The garbage collection subsystem combines three sub-algorithms: the bunches garbage collector that cleans one bunch (possibly multiply-cached) independently of any other, the scion cleaner that propagates accessibility information across bunches, and the group collector aimed at reclaiming inter-bunch cycles of dead objects. These three sub-algorithms are highly independent. Thus, the garbage collection subsystem has a high degree of scalability and parallelism. On top of this, it reclaims cycles of garbage, it does not require any particular communication support such as causality or atomicity, and is well suited to large scale networks
GoFFish: A Sub-Graph Centric Framework for Large-Scale Graph Analytics
Large scale graph processing is a major research area for Big Data
exploration. Vertex centric programming models like Pregel are gaining traction
due to their simple abstraction that allows for scalable execution on
distributed systems naturally. However, there are limitations to this approach
which cause vertex centric algorithms to under-perform due to poor compute to
communication overhead ratio and slow convergence of iterative superstep. In
this paper we introduce GoFFish a scalable sub-graph centric framework
co-designed with a distributed persistent graph storage for large scale graph
analytics on commodity clusters. We introduce a sub-graph centric programming
abstraction that combines the scalability of a vertex centric approach with the
flexibility of shared memory sub-graph computation. We map Connected
Components, SSSP and PageRank algorithms to this model to illustrate its
flexibility. Further, we empirically analyze GoFFish using several real world
graphs and demonstrate its significant performance improvement, orders of
magnitude in some cases, compared to Apache Giraph, the leading open source
vertex centric implementation.Comment: Under review by a conference, 201
FISH: A 3D parallel MHD code for astrophysical applications
FISH is a fast and simple ideal magneto-hydrodynamics code that scales to ~10
000 processes for a Cartesian computational domain of ~1000^3 cells. The
simplicity of FISH has been achieved by the rigorous application of the
operator splitting technique, while second order accuracy is maintained by the
symmetric ordering of the operators. Between directional sweeps, the
three-dimensional data is rotated in memory so that the sweep is always
performed in a cache-efficient way along the direction of contiguous memory.
Hence, the code only requires a one-dimensional description of the conservation
equations to be solved. This approach also enable an elegant novel
parallelisation of the code that is based on persistent communications with MPI
for cubic domain decomposition on machines with distributed memory. This scheme
is then combined with an additional OpenMP parallelisation of different sweeps
that can take advantage of clusters of shared memory. We document the detailed
implementation of a second order TVD advection scheme based on flux
reconstruction. The magnetic fields are evolved by a constrained transport
scheme. We show that the subtraction of a simple estimate of the hydrostatic
gradient from the total gradients can significantly reduce the dissipation of
the advection scheme in simulations of gravitationally bound hydrostatic
objects. Through its simplicity and efficiency, FISH is as well-suited for
hydrodynamics classes as for large-scale astrophysical simulations on
high-performance computer clusters. In preparation for the release of a public
version, we demonstrate the performance of FISH in a suite of astrophysically
orientated test cases.Comment: 27 pages, 11 figure
Recommended from our members
Implementing fault tolerance in a 64-bit distributed operating system
This thesis explores the potential of 64-bit processors for providing a different style of distributed operating system. Rather than providing another reworking of the UNIX model, the use of the large address space for unifying volatile memory (virtual memory), persistent memory (file systems) and distributed network access is examined and a novel operating system, Arius, is proposed.
The concepts behind the design of ARIUS are briefly reviewed, and then the reliability of such a system is examined in detail. The unified nature of the architecture makes it possible to use a reliable single address space to provide a completely reliable system without the addition of other mechanisms. Protocols are proposed to provide locally scalable distributed shared memory and these are then augmented to handle machine failures transparently though the use of distributed checkpoints and rollback.
The checkpointing system makes use of the caching mechanism in DSM to provide data duplication for failure recovery. By using distributed memory for checkpoints, recovery from machine faults may be handled seamlessly. To cope with more “complete” failures, persistent storage is also included in the failure mechanism.
These protocols are modelled to show their operability and to determine the cost they incur in various types of parallel and serial programs. Results are presented to demonstrate these costs
Toward Linearizability Testing for Multi-Word Persistent Synchronization Primitives
Persistent memory makes it possible to recover in-memory data structures following a failure instead of rebuilding them from state saved in slow secondary storage. Implementing such recoverable data structures correctly is challenging as their underlying algorithms must deal with both parallelism and failures, which makes them especially susceptible to programming errors. Traditional proofs of correctness should therefore be combined with other methods, such as model checking or software testing, to minimize the likelihood of uncaught defects. This research focuses specifically on the algorithmic principles of software testing, particularly linearizability analysis, for multi-word persistent synchronization primitives such as conditional swap operations. We describe an efficient decision procedure for linearizability in this context, and discuss its practical applications in detecting previously-unknown bugs in implementations of multi-word persistent primitives
- …