3,474 research outputs found
A Robust Fault-Tolerant and Scalable Cluster-wide Deduplication for Shared-Nothing Storage Systems
Deduplication has been largely employed in distributed storage systems to
improve space efficiency. Traditional deduplication research ignores the design
specifications of shared-nothing distributed storage systems such as no central
metadata bottleneck, scalability, and storage rebalancing. Further,
deduplication introduces transactional changes, which are prone to errors in
the event of a system failure, resulting in inconsistencies in data and
deduplication metadata. In this paper, we propose a robust, fault-tolerant and
scalable cluster-wide deduplication that can eliminate duplicate copies across
the cluster. We design a distributed deduplication metadata shard which
guarantees performance scalability while preserving the design constraints of
shared- nothing storage systems. The placement of chunks and deduplication
metadata is made cluster-wide based on the content fingerprint of chunks. To
ensure transactional consistency and garbage identification, we employ a
flag-based asynchronous consistency mechanism. We implement the proposed
deduplication on Ceph. The evaluation shows high disk-space savings with
minimal performance degradation as well as high robustness in the event of
sudden server failure.Comment: 6 Pages including reference
Parallel processing and expert systems
Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited
Lock-free atom garbage collection for multithreaded Prolog
The runtime system of dynamic languages such as Prolog or Lisp and their
derivatives contain a symbol table, in Prolog often called the atom table. A
simple dynamically resizing hash-table used to be an adequate way to implement
this table. As Prolog becomes fashionable for 24x7 server processes we need to
deal with atom garbage collection and concurrent access to the atom table.
Classical lock-based implementations to ensure consistency of the atom table
scale poorly and a stop-the-world approach to implement atom garbage collection
quickly becomes a bottle-neck, making Prolog unsuitable for soft real-time
applications. In this article we describe a novel implementation for the atom
table using lock-free techniques where the atom-table remains accessible even
during atom garbage collection. Relying only on CAS (Compare And Swap) and not
on external libraries, the implementation is straightforward and portable.
Under consideration for acceptance in TPLP.Comment: Paper presented at the 32nd International Conference on Logic
Programming (ICLP 2016), New York City, USA, 16-21 October 2016, 14 pages,
LaTeX, 4 PDF figure
Designing application software in wide area network settings
Progress in methodologies for developing robust local area network software has not been matched by similar results for wide area settings. The design of application software spanning multiple local area environments is examined. For important classes of applications, simple design techniques are presented that yield fault tolerant wide area programs. An implementation of these techniques as a set of tools for use within the ISIS system is described
Parallel and distributed Gr\"obner bases computation in JAS
This paper considers parallel Gr\"obner bases algorithms on distributed
memory parallel computers with multi-core compute nodes. We summarize three
different Gr\"obner bases implementations: shared memory parallel, pure
distributed memory parallel and distributed memory combined with shared memory
parallelism. The last algorithm, called distributed hybrid, uses only one
control communication channel between the master node and the worker nodes and
keeps polynomials in shared memory on a node. The polynomials are transported
asynchronous to the control-flow of the algorithm in a separate distributed
data structure. The implementation is generic and works for all implemented
(exact) fields. We present new performance measurements and discuss the
performance of the algorithms.Comment: 14 pages, 8 tables, 13 figure
An events based algorithm for distributing concurrent tasks on multi-core architectures
In this paper, a programming model is presented which enables scalable parallel performance on multi-core shared memory architectures. The model has been developed for application to a wide range of numerical simulation problems. Such problems involve time stepping or iteration algorithms where synchronization of multiple threads of execution is required. It is shown that traditional approaches to parallelism including message passing and scatter-gather can be improved upon in terms of speed-up and memory management. Using spatial decomposition to create orthogonal computational tasks, a new task management algorithm called H-Dispatch is developed. This algorithm makes efficient use of memory resources by limiting the need for garbage collection and takes optimal advantage of multiple cores by employing a “hungry” pull strategy. The technique is demonstrated on a simple finite difference solver and results are compared to traditional MPI and scatter-gather approaches. The H-Dispatch approach achieves near linear speed-up with results for efficiency of 85% on a 24-core machine. It is noted that the H-Dispatch algorithm is quite general and can be applied to a wide class of computational tasks on heterogeneous architectures involving multi-core and GPGPU hardware.Schlumberger-Doll Research CenterSaudi Aramc
A parallel algorithm for switch-level timing simulation on a hypercube multiprocessor
The parallel approach to speeding up simulation is studied, specifically the simulation of digital LSI MOS circuitry on the Intel iPSC/2 hypercube. The simulation algorithm is based on RSIM, an event driven switch-level simulator that incorporates a linear transistor model for simulating digital MOS circuits. Parallel processing techniques based on the concepts of Virtual Time and rollback are utilized so that portions of the circuit may be simulated on separate processors, in parallel for as large an increase in speed as possible. A partitioning algorithm is also developed in order to subdivide the circuit for parallel processing
Fast Lean Erasure-Coded Atomic Memory Object
In this work, we propose FLECKS, an algorithm which implements atomic memory objects in a multi-writer multi-reader (MWMR) setting in asynchronous networks and server failures. FLECKS substantially reduces storage and communication costs over its replication-based counterparts by employing erasure-codes. FLECKS outperforms the previously proposed algorithms in terms of the metrics that to deliver good performance such as storage cost per object, communication cost a high fault-tolerance of clients and servers, guaranteed liveness of operation, and a given number of communication rounds per operation, etc. We provide proofs for liveness and atomicity properties of FLECKS and derive worst-case latency bounds for the operations. We implemented and deployed FLECKS in cloud-based clusters and demonstrate that FLECKS has substantially lower storage and bandwidth costs, and significantly lower latency of operations than the replication-based mechanisms
- …