4,551 research outputs found

    Exploiting non-constant safe memory in resilient algorithms and data structures

    Get PDF
    We extend the Faulty RAM model by Finocchi and Italiano (2008) by adding a safe memory of arbitrary size SS, and we then derive tradeoffs between the performance of resilient algorithmic techniques and the size of the safe memory. Let δ\delta and α\alpha denote, respectively, the maximum amount of faults which can happen during the execution of an algorithm and the actual number of occurred faults, with αδ\alpha \leq \delta. We propose a resilient algorithm for sorting nn entries which requires O(nlogn+α(δ/S+logS))O\left(n\log n+\alpha (\delta/S + \log S)\right) time and uses Θ(S)\Theta(S) safe memory words. Our algorithm outperforms previous resilient sorting algorithms which do not exploit the available safe memory and require O(nlogn+αδ)O\left(n\log n+ \alpha\delta\right) time. Finally, we exploit our sorting algorithm for deriving a resilient priority queue. Our implementation uses Θ(S)\Theta(S) safe memory words and Θ(n)\Theta(n) faulty memory words for storing nn keys, and requires O(logn+δ/S)O\left(\log n + \delta/S\right) amortized time for each insert and deletemin operation. Our resilient priority queue improves the O(logn+δ)O\left(\log n + \delta\right) amortized time required by the state of the art.Comment: To appear in Theoretical Computer Science, 201

    Selection in the Presence of Memory Faults, with Applications to In-place Resilient Sorting

    Full text link
    The selection problem, where one wishes to locate the kthk^{th} smallest element in an unsorted array of size nn, is one of the basic problems studied in computer science. The main focus of this work is designing algorithms for solving the selection problem in the presence of memory faults. These can happen as the result of cosmic rays, alpha particles, or hardware failures. Specifically, the computational model assumed here is a faulty variant of the RAM model (abbreviated as FRAM), which was introduced by Finocchi and Italiano. In this model, the content of memory cells might get corrupted adversarially during the execution, and the algorithm is given an upper bound δ\delta on the number of corruptions that may occur. The main contribution of this work is a deterministic resilient selection algorithm with optimal O(n) worst-case running time. Interestingly, the running time does not depend on the number of faults, and the algorithm does not need to know δ\delta. The aforementioned resilient selection algorithm can be used to improve the complexity bounds for resilient kk-d trees developed by Gieseke, Moruz and Vahrenhold. Specifically, the time complexity for constructing a kk-d tree is improved from O(nlog2n+δ2)O(n\log^2 n + \delta^2) to O(nlogn)O(n \log n). Besides the deterministic algorithm, a randomized resilient selection algorithm is developed, which is simpler than the deterministic one, and has O(n+α)O(n + \alpha) expected time complexity and O(1) space complexity (i.e., is in-place). This algorithm is used to develop the first resilient sorting algorithm that is in-place and achieves optimal O(nlogn+αδ)O(n\log n + \alpha\delta) expected running time.Comment: 26 page

    The Parallel Persistent Memory Model

    Full text link
    We consider a parallel computational model that consists of PP processors, each with a fast local ephemeral memory of limited size, and sharing a large persistent memory. The model allows for each processor to fault with bounded probability, and possibly restart. On faulting all processor state and local ephemeral memory are lost, but the persistent memory remains. This model is motivated by upcoming non-volatile memories that are as fast as existing random access memory, are accessible at the granularity of cache lines, and have the capability of surviving power outages. It is further motivated by the observation that in large parallel systems, failure of processors and their caches is not unusual. Within the model we develop a framework for developing locality efficient parallel algorithms that are resilient to failures. There are several challenges, including the need to recover from failures, the desire to do this in an asynchronous setting (i.e., not blocking other processors when one fails), and the need for synchronization primitives that are robust to failures. We describe approaches to solve these challenges based on breaking computations into what we call capsules, which have certain properties, and developing a work-stealing scheduler that functions properly within the context of failures. The scheduler guarantees a time bound of O(W/PA+D(P/PA)log1/fW)O(W/P_A + D(P/P_A) \lceil\log_{1/f} W\rceil) in expectation, where WW and DD are the work and depth of the computation (in the absence of failures), PAP_A is the average number of processors available during the computation, and f1/2f \le 1/2 is the probability that a capsule fails. Within the model and using the proposed methods, we develop efficient algorithms for parallel sorting and other primitives.Comment: This paper is the full version of a paper at SPAA 2018 with the same nam

    On the Error Resilience of Ordered Binary Decision Diagrams

    Get PDF
    Ordered Binary Decision Diagrams (OBDDs) are a data structure that is used in an increasing number of fields of Computer Science (e.g., logic synthesis, program verification, data mining, bioinformatics, and data protection) for representing and manipulating discrete structures and Boolean functions. The purpose of this paper is to study the error resilience of OBDDs and to design a resilient version of this data structure, i.e., a self-repairing OBDD. In particular, we describe some strategies that make reduced ordered OBDDs resilient to errors in the indexes, that are associated to the input variables, or in the pointers (i.e., OBDD edges) of the nodes. These strategies exploit the inherent redundancy of the data structure, as well as the redundancy introduced by its efficient implementations. The solutions we propose allow the exact restoring of the original OBDD and are suitable to be applied to classical software packages for the manipulation of OBDDs currently in use. Another result of the paper is the definition of a new canonical OBDD model, called {\em Index-resilient Reduced OBDD}, which guarantees that a node with a faulty index has a reconstruction cost O(k)O(k), where kk is the number of nodes with corrupted index

    Hyperswitch communication network

    Get PDF
    The Hyperswitch Communication Network (HCN) is a large scale parallel computer prototype being developed at JPL. Commercial versions of the HCN computer are planned. The HCN computer being designed is a message passing multiple instruction multiple data (MIMD) computer, and offers many advantages in price-performance ratio, reliability and availability, and manufacturing over traditional uniprocessors and bus based multiprocessors. The design of the HCN operating system is a uniquely flexible environment that combines both parallel processing and distributed processing. This programming paradigm can achieve a balance among the following competing factors: performance in processing and communications, user friendliness, and fault tolerance. The prototype is being designed to accommodate a maximum of 64 state of the art microprocessors. The HCN is classified as a distributed supercomputer. The HCN system is described, and the performance/cost analysis and other competing factors within the system design are reviewed

    A system for routing arbitrary directed graphs on SIMD architectures

    Get PDF
    There are many problems which can be described in terms of directed graphs that contain a large number of vertices where simple computations occur using data from connecting vertices. A method is given for parallelizing such problems on an SIMD machine model that is bit-serial and uses only nearest neighbor connections for communication. Each vertex of the graph will be assigned to a processor in the machine. Algorithms are given that will be used to implement movement of data along the arcs of the graph. This architecture and algorithms define a system that is relatively simple to build and can do graph processing. All arcs can be transversed in parallel in time O(T), where T is empirically proportional to the diameter of the interconnection network times the average degree of the graph. Modifying or adding a new arc takes the same time as parallel traversal

    Lossless fault-tolerant data structures with additive overhead

    Get PDF
    12th International Symposium, WADS 2011, New York, NY, USA, August 15-17, 2011. ProceedingsWe develop the first dynamic data structures that tolerate δ memory faults, lose no data, and incur only an O(δ ) additive overhead in overall space and time per operation. We obtain such data structures for arrays, linked lists, binary search trees, interval trees, predecessor search, and suffix trees. Like previous data structures, δ must be known in advance, but we show how to restore pristine state in linear time, in parallel with queries, making δ just a bound on the rate of memory faults. Our data structures require Θ(δ) words of safe memory during an operation, which may not be theoretically necessary but seems a practical assumption.Center for Massive Data Algorithmics (MADALGO

    Robust and Adaptive Search

    Get PDF
    Binary search finds a given element in a sorted array with an optimal number of log n queries. However, binary search fails even when the array is only slightly disordered or access to its elements is subject to errors. We study the worst-case query complexity of search algorithms that are robust to imprecise queries and that adapt to perturbations of the order of the elements. We give (almost) tight results for various parameters that quantify query errors and that measure array disorder. In particular, we exhibit settings where query complexities of log n + ck, (1+epsilon) log n + ck, and sqrt(cnk)+o(nk) are best-possible for parameter value k, any epsilon > 0, and constant c

    Resilient Level Ancestor, Bottleneck, and Lowest Common Ancestor Queries in Dynamic Trees

    Get PDF
    We study the problem of designing a resilient data structure maintaining a tree under the Faulty-RAM model [Finocchi and Italiano, STOC\u2704] in which up to ? memory words can be corrupted by an adversary. Our data structure stores a rooted dynamic tree that can be updated via the addition of new leaves, requires linear size, and supports resilient (weighted) level ancestor queries, lowest common ancestor queries, and bottleneck vertex queries in O(?) worst-case time per operation

    On-board B-ISDN fast packet switching architectures. Phase 2: Development. Proof-of-concept architecture definition report

    Get PDF
    For the next-generation packet switched communications satellite system with onboard processing and spot-beam operation, a reliable onboard fast packet switch is essential to route packets from different uplink beams to different downlink beams. The rapid emergence of point-to-point services such as video distribution, and the large demand for video conference, distributed data processing, and network management makes the multicast function essential to a fast packet switch (FPS). The satellite's inherent broadcast features gives the satellite network an advantage over the terrestrial network in providing multicast services. This report evaluates alternate multicast FPS architectures for onboard baseband switching applications and selects a candidate for subsequent breadboard development. Architecture evaluation and selection will be based on the study performed in phase 1, 'Onboard B-ISDN Fast Packet Switching Architectures', and other switch architectures which have become commercially available as large scale integration (LSI) devices
    corecore