35 research outputs found

    Deterministic Computations on a PRAM with Static Processor and Memory Faults.

    Get PDF
    We consider Parallel Random Access Machine (PRAM) which has some processors and memory cells faulty. The faults considered are static, i.e., once the machine starts to operate, the operational/faulty status of PRAM components does not change. We develop a deterministic simulation of a fully operational PRAM on a similar faulty machine which has constant fractions of faults among processors and memory cells. The simulating PRAM has nn processors and mm memory cells, and simulates a PRAM with nn processors and a constant fraction of mm memory cells. The simulation is in two phases: it starts with preprocessing, which is followed by the simulation proper performed in a step-by-step fashion. Preprocessing is performed in time O((mn+logn)logn)O((\frac{m}{n}+ \log n)\log n). The slowdown of a step-by-step part of the simulation is O(logm)O(\log m)

    Shared memory with hidden latency on a family of mesh-like networks

    Get PDF

    Data Oblivious Algorithms for Multicores

    Get PDF
    As secure processors such as Intel SGX (with hyperthreading) become widely adopted, there is a growing appetite for private analytics on big data. Most prior works on data-oblivious algorithms adopt the classical PRAM model to capture parallelism. However, it is widely understood that PRAM does not best capture realistic multicore processors, nor does it reflect parallel programming models adopted in practice. In this paper, we initiate the study of parallel data oblivious algorithms on realistic multicores, best captured by the binary fork-join model of computation. We first show that data-oblivious sorting can be accomplished by a binary fork-join algorithm with optimal total work and optimal (cache-oblivious) cache complexity, and in O(log n log log n) span (i.e., parallel time) that matches the best-known insecure algorithm. Using our sorting algorithm as a core primitive, we show how to data-obliviously simulate general PRAM algorithms in the binary fork-join model with non-trivial efficiency. We also present results for several applications including list ranking, Euler tour, tree contraction, connected components, and minimum spanning forest. For a subset of these applications, our data-oblivious algorithms asymptotically outperform the best known insecure algorithms. For other applications, we show data oblivious algorithms whose performance bounds match the best known insecure algorithms. Complementing these asymptotically efficient results, we present a practical variant of our sorting algorithm that is self-contained and potentially implementable. It has optimal caching cost, and it is only a log log n factor off from optimal work and about a log n factor off in terms of span; moreover, it achieves small constant factors in its bounds

    A Practical Scalable Shared-Memory Parallel Algorithm for Computing Minimum Spanning Trees

    Get PDF

    Oblivious Network RAM and Leveraging Parallelism to Achieve Obliviousness

    Get PDF
    Oblivious RAM (ORAM) is a cryptographic primitive that allows a trusted CPU to securely access untrusted memory, such that the access patterns reveal nothing about sensitive data. ORAM is known to have broad applications in secure processor design and secure multi-party computation for big data. Unfortunately, due to a logarithmic lower bound by Goldreich and Ostrovsky (Journal of the ACM, \u2796), ORAM is bound to incur a moderate cost in practice. In particular, with the latest developments in ORAM constructions, we are quickly approaching this limit, and the room for performance improvement is small. In this paper, we consider new models of computation in which the cost of obliviousness can be fundamentally reduced in comparison with the standard ORAM model. We propose the Oblivious Network RAM model of computation, where a CPU communicates with multiple memory banks, such that the adversary observes only which bank the CPU is communicating with, but not the address oset within each memory bank. In other words, obliviousness within each bank comes for free either because the architecture prevents a malicious party from observing the address accessed within a bank, or because another solution is used to obfuscate memory accesses within each bank and hence we only need to obfuscate communication patterns between the CPU and the memory banks. We present new constructions for obliviously simulating general or parallel programs in the Network RAM model. We describe applications of our new model in secure processor design and in distributed storage applications with a network adversary

    Memory Checking for Parallel RAMs

    Get PDF
    When outsourcing a database to an untrusted remote server, one might want to verify the integrity of contents while accessing it. To solve this, Blum et al. [FOCS `91] propose the notion of memory checking. Memory checking allows a user to run a RAM program on a remote server, with the ability to verify integrity of the storage with small local storage. In this work, we define and initiate the formal study of memory checking for Parallel RAMs (PRAMs). The parallel RAM model is very expressive and captures many modern architectures such as multi-core architectures and cloud clusters. When multiple clients run a PRAM algorithm on a shared remote server, it is possible that there are concurrency issues that cause inconsistencies. Therefore, integrity verification is even more desirable property in this setting. Assuming only the existence of one-way functions, we construct an online memory checker (one that reports faults as soon as they occur) for PRAMs with O(logN)O(\log N) simulation overhead in both work and depth. In addition, we construct an offline memory checker (one that reports faults only after a long sequence of operations) with amortized O(1)O(1) simulation overhead in both work and depth. Our constructions match the best known simulation overhead of the memory checkers in the standard single-user RAM setting. As an application of our parallel memory checking constructions, we additionally construct the first maliciously secure oblivious parallel RAM (OPRAM) with polylogarithmic overhead

    Optimal Oblivious Parallel RAM

    Get PDF
    An oblivious RAM (ORAM), introduced by Goldreich and Ostrovsky (STOC \u2787 and J. ACM \u2796), is a technique for hiding RAM\u27s access pattern. That is, for every input the distribution of the observed locations accessed by the machine is essentially independent of the machine\u27s secret inputs. Recent progress culminated in a work of Asharov et al. (EUROCRYPT \u2720), obtaining an ORAM with (amortized) logarithmic overhead in total work, which is known to be optimal. Oblivious Parallel RAM (OPRAM) is a natural extension of ORAM to the (more realistic) parallel setting where several processors make concurrent accesses to a shared memory. It is known that any OPRAM must incur logarithmic work overhead and for highly parallel RAMs a logarithmic depth blowup (in the balls and bins model). Despite the significant recent advances, there is still a large gap: all existing OPRAM schemes incur a poly-logarithmic overhead either in total work or in depth. Our main result closes the aforementioned gap and provides an essentially optimal OPRAM scheme. Specifically, assuming one-way functions, we show that any Parallel RAM with memory capacity~NN can be obliviously simulated in space O(N)O(N), incurring only O(logN)O(\log N) blowup in (amortized) total work as well as in depth. Our transformation supports all PRAMs in the CRCW mode and the resulting simulation is in the CRCW mode as well

    Progress Report : 1991 - 1994

    Get PDF

    Parallel Weighted Random Sampling

    Get PDF
    Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps both for shared-memory and distributed-memory machines. We give efficient, fast, and practicable algorithms for sampling single items, k items with/without replacement, permutations, subsets, and reservoirs. We also give improved sequential algorithms for alias table construction and for sampling with replacement. Experiments on shared-memory parallel machines with up to 158 threads show near linear speedups both for construction and queries

    Improved Parallel Algorithms for Spanners and Hopsets

    Full text link
    We use exponential start time clustering to design faster and more work-efficient parallel graph algorithms involving distances. Previous algorithms usually rely on graph decomposition routines with strict restrictions on the diameters of the decomposed pieces. We weaken these bounds in favor of stronger local probabilistic guarantees. This allows more direct analyses of the overall process, giving: * Linear work parallel algorithms that construct spanners with O(k)O(k) stretch and size O(n1+1/k)O(n^{1+1/k}) in unweighted graphs, and size O(n1+1/klogk)O(n^{1+1/k} \log k) in weighted graphs. * Hopsets that lead to the first parallel algorithm for approximating shortest paths in undirected graphs with O(m  polylog  n)O(m\;\mathrm{polylog}\;n) work
    corecore