86,350 research outputs found

    Zig-zag Sort: A Simple Deterministic Data-Oblivious Sorting Algorithm Running in O(n log n) Time

    Full text link
    We describe and analyze Zig-zag Sort--a deterministic data-oblivious sorting algorithm running in O(n log n) time that is arguably simpler than previously known algorithms with similar properties, which are based on the AKS sorting network. Because it is data-oblivious and deterministic, Zig-zag Sort can be implemented as a simple O(n log n)-size sorting network, thereby providing a solution to an open problem posed by Incerpi and Sedgewick in 1985. In addition, Zig-zag Sort is a variant of Shellsort, and is, in fact, the first deterministic Shellsort variant running in O(n log n) time. The existence of such an algorithm was posed as an open problem by Plaxton et al. in 1992 and also by Sedgewick in 1996. More relevant for today, however, is the fact that the existence of a simple data-oblivious deterministic sorting algorithm running in O(n log n) time simplifies the inner-loop computation in several proposed oblivious-RAM simulation methods (which utilize AKS sorting networks), and this, in turn, implies simplified mechanisms for privacy-preserving data outsourcing in several cloud computing applications. We provide both constructive and non-constructive implementations of Zig-zag Sort, based on the existence of a circuit known as an epsilon-halver, such that the constant factors in our constructive implementations are orders of magnitude smaller than those for constructive variants of the AKS sorting network, which are also based on the use of epsilon-halvers.Comment: Appearing in ACM Symp. on Theory of Computing (STOC) 201

    Efficient parallel computation on multiprocessors with optical interconnection networks

    Get PDF
    This dissertation studies optical interconnection networks, their architecture, address schemes, and computation and communication capabilities. We focus on a simple but powerful optical interconnection network model - the Linear Array with Reconfigurable pipelined Bus System (LARPBS). We extend the LARPBS model to a simplified higher dimensional LAPRBS and provide a set of basic computation operations. We then study the following two groups of parallel computation problems on both one dimensional LARPBS\u27s as well as multi-dimensional LARPBS\u27s: parallel comparison problems, including sorting, merging, and selection; Boolean matrix multiplication, transitive closure and their applications to connected component problems. We implement an optimal sorting algorithm on an n-processor LARPBS. With this optimal sorting algorithm at disposal, we study the sorting problem for higher dimensional LARPBS\u27s and obtain the following results: • An optimal basic Columnsort algorithm on a 2D LARPBS. • Two optimal two-way merge sort algorithms on a 2D LARPBS. • An optimal multi-way merge sorting algorithm on a 2D LARPBS. • An optimal generalized column sort algorithm on a 2D LARPBS. • An optimal generalized column sort algorithm on a 3D LARPBS. • An optimal 5-phase sorting algorithm on a 3D LARPBS. Results for selection problems are as follows: • A constant time maximum-finding algorithm on an LARPBS. • An optimal maximum-finding algorithm on an LARPBS. • An O((log log n)2) time parallel selection algorithm on an LARPBS. • An O(k(log log n)2) time parallel multi-selection algorithm on an LARPBS. While studying the computation and communication properties of the LARPBS model, we find Boolean matrix multiplication and its applications to the graph are another set of problem that can be solved efficiently on the LARPBS. Following is a list of results we have obtained in this area. • A constant time Boolean matrix multiplication algorithm. • An O(log n)-time transitive closure algorithm. • An O(log n)-time connected components algorithm. • An O(log n)-time strongly connected components algorithm. The results provided in this dissertation show the strong computation and communication power of optical interconnection networks

    The automated proof of a trace transformation for a bitonic sort

    Get PDF
    AbstractIn his third volume of The Art of Computer Programming, Knuth presents Batcher's bitonic sorting network. With concurrency, this sorting network can be executed in logarithmic time. Knuth suggests a formal argument for the correctness of the bitonic sorting algorithm (as an exercise), but addresses the question of concurrency only informally. We develop a program for the bitonic sort by (1) deriving a stepwise refinement from Knuth's informal description of the algorithm, (2) deriving from the refinement a sequential execution or ‘trace’ of order O (n log n) in the length n of the sequence to be sorted, and (3) transforming the sequential trace into a parallel trace of order O(log n) while preserving its semantics. We shall be informal in Steps 1 and 2—although these steps can be formalized. But we will provide a formal treatment of Step 3 and report on the certification of this treatment in a mechanized logic. This work is a contribution to the optimization of programs (via concurrency) through transformation and the automation of program proofs

    Deterministic Selection on the Mesh and Hypercube

    Get PDF
    In this paper we present efficient deterministic algorithms for selection on the mesh connected computers (referred to as the mesh from hereon) and the hypercube. Our algorithm on the mesh runs in time O([n/p] log logp + √p logn) where n is the input size and p is the number of processors. The time bound is significantly better than that of the best existing algorithms when n is large. The run time of our algorithm on the hypercube is O ([n/p] log log p + Ts/p log nM/em\u3e), where Ts/p is the time needed to sort p element on a p-node hypercube. In fact, the same algorithm runs on an network in time O([n/p] log log p +Ts/p log), where Ts/p is the time needed for sorting p keys using p processors (assuming that broadcast and prefix computations take time less than or equal to Ts/p

    Brief Announcement: New Clocks, Fast Line Formation and Self-Replication Population Protocols

    Get PDF
    In this paper we consider a known variant of the standard population protocol model in which agents can be connected by edges, referred to as the network constructor model. During an interaction between two agents the relevant connecting edge can be formed, maintained or eliminated by the transition function. The state space of agents is fixed (constant size) and the size n of the population is not known, i.e., not hard-coded in the transition function. Since pairs of agents are chosen uniformly at random the status of each edge is updated every Θ(n2) interactions in expectation which coincides with Θ(n) parallel time. This phenomenon provides a natural lower bound on the time complexity for any non-trivial network construction designed for this variant. This is in contrast with the standard population protocol model in which efficient protocols operate in O(poly log n) parallel time. The main focus in this paper is on efficient manipulation of linear structures including formation, self-replication and distribution (including pipelining) of complex information in the adopted model. We propose and analyse a novel edge based phase clock counting parallel time Θ(n log n) in the network constructor model, showing also that its leader based counterpart provides the same time guaranties in the standard population protocol model. Note that all currently known phase clocks can count parallel time not exceeding O(poly log n). The new clock enables a nearly optimal O(n log n) parallel time spanning line construction (a key component of universal network construction), which improves dramatically on the best currently known O(n2) parallel time protocol, solving the main open problem in the considered model [9]. We propose a new probabilistic bubble-sort algorithm in which random comparisons and transfers are allowed only between the adjacent positions in the sequence. Utilising a novel potential function reasoning we show that rather surprisingly this probabilistic sorting (via conditional pipelining) procedure requires O(n2) comparisons in expectation and whp, and is on par with its deterministic counterpart. We propose the first population protocol allowing self-replication of a strand of an arbitrary length k (carrying a k-bit message of size independent of the state space) in parallel time O(n(k + log n)). The pipelining mechanism and the time complexity analysis of the strand self-replication protocol mimic those used in the probabilistic bubble-sort. The new protocol permits also simultaneous self-replication, where l copies of the strand can be created in time O(n(k + log n) log l). Finally, we discuss application of the strand self-replication protocol to pattern matching. Our protocols are always correct and provide time guaranties with high probability defined as 1 - n-η, for a constant η > 0

    Towards Simpler Sorting Networks and Monotone Circuits for Majority

    Full text link
    In this paper, we study the problem of computing the majority function by low-depth monotone circuits and a related problem of constructing low-depth sorting networks. We consider both the classical setting with elementary operations of arity 22 and the generalized setting with operations of arity kk, where kk is a parameter. For both problems and both settings, there are various constructions known, the minimal known depth being logarithmic. However, there is currently no known construction that simultaneously achieves sub-log-squared depth, effective constructability, simplicity, and has a potential to be used in practice. In this paper we make progress towards resolution of this problem. For computing majority by standard monotone circuits (gates of arity 2) we provide an explicit monotone circuit of depth O(log25/3n)O(\log_2^{5/3} n). The construction is a combination of several known and not too complicated ideas. For arbitrary arity of gates kk we provide a new sorting network architecture inspired by representation of inputs as a high-dimensional cube. As a result we provide a simple construction that improves previous upper bound of 4logk2n4 \log_k^2 n to 2logk2n2 \log_k^2 n. We prove the similar bound for the depth of the circuit computing majority of nn bits consisting of gates computing majority of kk bits. Note, that for both problems there is an explicit construction of depth O(logkn)O(\log_k n) known, but the construction is complicated and the constant hidden in OO-notation is huge

    Tolerating Faults in Counting Networks

    Get PDF
    Counting networks were proposed by Aspnes, Herlihy and Shavit [4] as a technique for solving multiprocessor coordination problems. We describe a method for tolerating an arbitrary number of faults in counting networks. In our fault model, the following errors can occur dynamically in the counting network data structure: 1) a balancer's state is spuriously altered, 2) a balancer's state can no longer be accessed. We propose two approaches for tolerating faults. The first is based on a construction for a fault-tolerant balancer. We substitute a fault-tolerant balancer for every balancer in a counting network. Thus, we transform a counting network with depth O(log to the power of 2 n); where n is the width, into a k-fault-tolerant counting network with depth O(k log to the power of 2 n). The second approach is to append a correction network, built with fault-tolerant balancers, to a counting network that may experience faults. We present a bound on the error in the output token distribution of counting networks with faulty balancers (a generalization of the error bound for sorting networks with faulty comparators presented by Yao & Yao [21]. Given a token distribution with a bounded error, the correction network produces a token distribution that is smooth, i.e., the number of tokens on each output wire differs by at most one (a weaker condition than the step property). In order to tolerate k faults, the correction network has depth O (k to the power of 2 log n) for a network of width n
    corecore