4,551 research outputs found
Exploiting non-constant safe memory in resilient algorithms and data structures
We extend the Faulty RAM model by Finocchi and Italiano (2008) by adding a
safe memory of arbitrary size , and we then derive tradeoffs between the
performance of resilient algorithmic techniques and the size of the safe
memory. Let and denote, respectively, the maximum amount of
faults which can happen during the execution of an algorithm and the actual
number of occurred faults, with . We propose a resilient
algorithm for sorting entries which requires time and uses safe memory words. Our
algorithm outperforms previous resilient sorting algorithms which do not
exploit the available safe memory and require time. Finally, we exploit our sorting algorithm for
deriving a resilient priority queue. Our implementation uses safe
memory words and faulty memory words for storing keys, and
requires amortized time for each insert and
deletemin operation. Our resilient priority queue improves the amortized time required by the state of the art.Comment: To appear in Theoretical Computer Science, 201
Selection in the Presence of Memory Faults, with Applications to In-place Resilient Sorting
The selection problem, where one wishes to locate the smallest
element in an unsorted array of size , is one of the basic problems studied
in computer science. The main focus of this work is designing algorithms for
solving the selection problem in the presence of memory faults. These can
happen as the result of cosmic rays, alpha particles, or hardware failures.
Specifically, the computational model assumed here is a faulty variant of the
RAM model (abbreviated as FRAM), which was introduced by Finocchi and Italiano.
In this model, the content of memory cells might get corrupted adversarially
during the execution, and the algorithm is given an upper bound on the
number of corruptions that may occur.
The main contribution of this work is a deterministic resilient selection
algorithm with optimal O(n) worst-case running time. Interestingly, the running
time does not depend on the number of faults, and the algorithm does not need
to know .
The aforementioned resilient selection algorithm can be used to improve the
complexity bounds for resilient -d trees developed by Gieseke, Moruz and
Vahrenhold. Specifically, the time complexity for constructing a -d tree is
improved from to .
Besides the deterministic algorithm, a randomized resilient selection
algorithm is developed, which is simpler than the deterministic one, and has
expected time complexity and O(1) space complexity (i.e., is
in-place). This algorithm is used to develop the first resilient sorting
algorithm that is in-place and achieves optimal
expected running time.Comment: 26 page
The Parallel Persistent Memory Model
We consider a parallel computational model that consists of processors,
each with a fast local ephemeral memory of limited size, and sharing a large
persistent memory. The model allows for each processor to fault with bounded
probability, and possibly restart. On faulting all processor state and local
ephemeral memory are lost, but the persistent memory remains. This model is
motivated by upcoming non-volatile memories that are as fast as existing random
access memory, are accessible at the granularity of cache lines, and have the
capability of surviving power outages. It is further motivated by the
observation that in large parallel systems, failure of processors and their
caches is not unusual.
Within the model we develop a framework for developing locality efficient
parallel algorithms that are resilient to failures. There are several
challenges, including the need to recover from failures, the desire to do this
in an asynchronous setting (i.e., not blocking other processors when one
fails), and the need for synchronization primitives that are robust to
failures. We describe approaches to solve these challenges based on breaking
computations into what we call capsules, which have certain properties, and
developing a work-stealing scheduler that functions properly within the context
of failures. The scheduler guarantees a time bound of in expectation, where and are the work and
depth of the computation (in the absence of failures), is the average
number of processors available during the computation, and is the
probability that a capsule fails. Within the model and using the proposed
methods, we develop efficient algorithms for parallel sorting and other
primitives.Comment: This paper is the full version of a paper at SPAA 2018 with the same
nam
On the Error Resilience of Ordered Binary Decision Diagrams
Ordered Binary Decision Diagrams (OBDDs) are a data structure that is used in
an increasing number of fields of Computer Science (e.g., logic synthesis,
program verification, data mining, bioinformatics, and data protection) for
representing and manipulating discrete structures and Boolean functions. The
purpose of this paper is to study the error resilience of OBDDs and to design a
resilient version of this data structure, i.e., a self-repairing OBDD. In
particular, we describe some strategies that make reduced ordered OBDDs
resilient to errors in the indexes, that are associated to the input variables,
or in the pointers (i.e., OBDD edges) of the nodes. These strategies exploit
the inherent redundancy of the data structure, as well as the redundancy
introduced by its efficient implementations. The solutions we propose allow the
exact restoring of the original OBDD and are suitable to be applied to
classical software packages for the manipulation of OBDDs currently in use.
Another result of the paper is the definition of a new canonical OBDD model,
called {\em Index-resilient Reduced OBDD}, which guarantees that a node with a
faulty index has a reconstruction cost , where is the number of nodes
with corrupted index
Hyperswitch communication network
The Hyperswitch Communication Network (HCN) is a large scale parallel computer prototype being developed at JPL. Commercial versions of the HCN computer are planned. The HCN computer being designed is a message passing multiple instruction multiple data (MIMD) computer, and offers many advantages in price-performance ratio, reliability and availability, and manufacturing over traditional uniprocessors and bus based multiprocessors. The design of the HCN operating system is a uniquely flexible environment that combines both parallel processing and distributed processing. This programming paradigm can achieve a balance among the following competing factors: performance in processing and communications, user friendliness, and fault tolerance. The prototype is being designed to accommodate a maximum of 64 state of the art microprocessors. The HCN is classified as a distributed supercomputer. The HCN system is described, and the performance/cost analysis and other competing factors within the system design are reviewed
A system for routing arbitrary directed graphs on SIMD architectures
There are many problems which can be described in terms of directed graphs that contain a large number of vertices where simple computations occur using data from connecting vertices. A method is given for parallelizing such problems on an SIMD machine model that is bit-serial and uses only nearest neighbor connections for communication. Each vertex of the graph will be assigned to a processor in the machine. Algorithms are given that will be used to implement movement of data along the arcs of the graph. This architecture and algorithms define a system that is relatively simple to build and can do graph processing. All arcs can be transversed in parallel in time O(T), where T is empirically proportional to the diameter of the interconnection network times the average degree of the graph. Modifying or adding a new arc takes the same time as parallel traversal
Lossless fault-tolerant data structures with additive overhead
12th International Symposium, WADS 2011, New York, NY, USA, August 15-17, 2011. ProceedingsWe develop the first dynamic data structures that tolerate δ memory faults, lose no data, and incur only an O(δ ) additive overhead in overall space and time per operation. We obtain such data structures for arrays, linked lists, binary search trees, interval trees, predecessor search, and suffix trees. Like previous data structures, δ must be known in advance, but we show how to restore pristine state in linear time, in parallel with queries, making δ just a bound on the rate of memory faults. Our data structures require Θ(δ) words of safe memory during an operation, which may not be theoretically necessary but seems a practical assumption.Center for Massive Data Algorithmics (MADALGO
Robust and Adaptive Search
Binary search finds a given element in a sorted array with an optimal number of log n queries. However, binary search fails even when the array is only slightly disordered or access to its elements is subject to errors. We study the worst-case query complexity of search algorithms that are robust to imprecise queries and that adapt to perturbations of the order of the elements. We give (almost) tight results for various parameters that quantify query errors and that measure array disorder. In particular, we exhibit settings where query complexities of log n + ck, (1+epsilon) log n + ck, and sqrt(cnk)+o(nk) are best-possible for parameter value k, any epsilon > 0, and constant c
Resilient Level Ancestor, Bottleneck, and Lowest Common Ancestor Queries in Dynamic Trees
We study the problem of designing a resilient data structure maintaining a tree under the Faulty-RAM model [Finocchi and Italiano, STOC\u2704] in which up to ? memory words can be corrupted by an adversary. Our data structure stores a rooted dynamic tree that can be updated via the addition of new leaves, requires linear size, and supports resilient (weighted) level ancestor queries, lowest common ancestor queries, and bottleneck vertex queries in O(?) worst-case time per operation
On-board B-ISDN fast packet switching architectures. Phase 2: Development. Proof-of-concept architecture definition report
For the next-generation packet switched communications satellite system with onboard processing and spot-beam operation, a reliable onboard fast packet switch is essential to route packets from different uplink beams to different downlink beams. The rapid emergence of point-to-point services such as video distribution, and the large demand for video conference, distributed data processing, and network management makes the multicast function essential to a fast packet switch (FPS). The satellite's inherent broadcast features gives the satellite network an advantage over the terrestrial network in providing multicast services. This report evaluates alternate multicast FPS architectures for onboard baseband switching applications and selects a candidate for subsequent breadboard development. Architecture evaluation and selection will be based on the study performed in phase 1, 'Onboard B-ISDN Fast Packet Switching Architectures', and other switch architectures which have become commercially available as large scale integration (LSI) devices
- …