86,350 research outputs found
Zig-zag Sort: A Simple Deterministic Data-Oblivious Sorting Algorithm Running in O(n log n) Time
We describe and analyze Zig-zag Sort--a deterministic data-oblivious sorting
algorithm running in O(n log n) time that is arguably simpler than previously
known algorithms with similar properties, which are based on the AKS sorting
network. Because it is data-oblivious and deterministic, Zig-zag Sort can be
implemented as a simple O(n log n)-size sorting network, thereby providing a
solution to an open problem posed by Incerpi and Sedgewick in 1985. In
addition, Zig-zag Sort is a variant of Shellsort, and is, in fact, the first
deterministic Shellsort variant running in O(n log n) time. The existence of
such an algorithm was posed as an open problem by Plaxton et al. in 1992 and
also by Sedgewick in 1996. More relevant for today, however, is the fact that
the existence of a simple data-oblivious deterministic sorting algorithm
running in O(n log n) time simplifies the inner-loop computation in several
proposed oblivious-RAM simulation methods (which utilize AKS sorting networks),
and this, in turn, implies simplified mechanisms for privacy-preserving data
outsourcing in several cloud computing applications. We provide both
constructive and non-constructive implementations of Zig-zag Sort, based on the
existence of a circuit known as an epsilon-halver, such that the constant
factors in our constructive implementations are orders of magnitude smaller
than those for constructive variants of the AKS sorting network, which are also
based on the use of epsilon-halvers.Comment: Appearing in ACM Symp. on Theory of Computing (STOC) 201
Efficient parallel computation on multiprocessors with optical interconnection networks
This dissertation studies optical interconnection networks, their architecture, address schemes, and computation and communication capabilities. We focus on a simple but powerful optical interconnection network model - the Linear Array with Reconfigurable pipelined Bus System (LARPBS). We extend the LARPBS model to a simplified higher dimensional LAPRBS and provide a set of basic computation operations. We then study the following two groups of parallel computation problems on both one dimensional LARPBS\u27s as well as multi-dimensional LARPBS\u27s: parallel comparison problems, including sorting, merging, and selection; Boolean matrix multiplication, transitive closure and their applications to connected component problems. We implement an optimal sorting algorithm on an n-processor LARPBS. With this optimal sorting algorithm at disposal, we study the sorting problem for higher dimensional LARPBS\u27s and obtain the following results: • An optimal basic Columnsort algorithm on a 2D LARPBS. • Two optimal two-way merge sort algorithms on a 2D LARPBS. • An optimal multi-way merge sorting algorithm on a 2D LARPBS. • An optimal generalized column sort algorithm on a 2D LARPBS. • An optimal generalized column sort algorithm on a 3D LARPBS. • An optimal 5-phase sorting algorithm on a 3D LARPBS. Results for selection problems are as follows: • A constant time maximum-finding algorithm on an LARPBS. • An optimal maximum-finding algorithm on an LARPBS. • An O((log log n)2) time parallel selection algorithm on an LARPBS. • An O(k(log log n)2) time parallel multi-selection algorithm on an LARPBS. While studying the computation and communication properties of the LARPBS model, we find Boolean matrix multiplication and its applications to the graph are another set of problem that can be solved efficiently on the LARPBS. Following is a list of results we have obtained in this area. • A constant time Boolean matrix multiplication algorithm. • An O(log n)-time transitive closure algorithm. • An O(log n)-time connected components algorithm. • An O(log n)-time strongly connected components algorithm. The results provided in this dissertation show the strong computation and communication power of optical interconnection networks
The automated proof of a trace transformation for a bitonic sort
AbstractIn his third volume of The Art of Computer Programming, Knuth presents Batcher's bitonic sorting network. With concurrency, this sorting network can be executed in logarithmic time. Knuth suggests a formal argument for the correctness of the bitonic sorting algorithm (as an exercise), but addresses the question of concurrency only informally. We develop a program for the bitonic sort by (1) deriving a stepwise refinement from Knuth's informal description of the algorithm, (2) deriving from the refinement a sequential execution or ‘trace’ of order O (n log n) in the length n of the sequence to be sorted, and (3) transforming the sequential trace into a parallel trace of order O(log n) while preserving its semantics. We shall be informal in Steps 1 and 2—although these steps can be formalized. But we will provide a formal treatment of Step 3 and report on the certification of this treatment in a mechanized logic. This work is a contribution to the optimization of programs (via concurrency) through transformation and the automation of program proofs
Deterministic Selection on the Mesh and Hypercube
In this paper we present efficient deterministic algorithms for selection on the mesh connected computers (referred to as the mesh from hereon) and the hypercube. Our algorithm on the mesh runs in time O([n/p] log logp + √p logn) where n is the input size and p is the number of processors. The time bound is significantly better than that of the best existing algorithms when n is large. The run time of our algorithm on the hypercube is O ([n/p] log log p + Ts/p log nM/em\u3e), where Ts/p is the time needed to sort p element on a p-node hypercube. In fact, the same algorithm runs on an network in time O([n/p] log log p +Ts/p log), where Ts/p is the time needed for sorting p keys using p processors (assuming that broadcast and prefix computations take time less than or equal to Ts/p
Brief Announcement: New Clocks, Fast Line Formation and Self-Replication Population Protocols
In this paper we consider a known variant of the standard population protocol model in which agents can be connected by edges, referred to as the network constructor model. During an interaction between two agents the relevant connecting edge can be formed, maintained or eliminated by the transition function. The state space of agents is fixed (constant size) and the size n of the population is not known, i.e., not hard-coded in the transition function. Since pairs of agents are chosen uniformly at random the status of each edge is updated every Θ(n2) interactions in expectation which coincides with Θ(n) parallel time. This phenomenon provides a natural lower bound on the time complexity for any non-trivial network construction designed for this variant. This is in contrast with the standard population protocol model in which efficient protocols operate in O(poly log n) parallel time. The main focus in this paper is on efficient manipulation of linear structures including formation, self-replication and distribution (including pipelining) of complex information in the adopted model. We propose and analyse a novel edge based phase clock counting parallel time Θ(n log n) in the network constructor model, showing also that its leader based counterpart provides the same time guaranties in the standard population protocol model. Note that all currently known phase clocks can count parallel time not exceeding O(poly log n). The new clock enables a nearly optimal O(n log n) parallel time spanning line construction (a key component of universal network construction), which improves dramatically on the best currently known O(n2) parallel time protocol, solving the main open problem in the considered model [9]. We propose a new probabilistic bubble-sort algorithm in which random comparisons and transfers are allowed only between the adjacent positions in the sequence. Utilising a novel potential function reasoning we show that rather surprisingly this probabilistic sorting (via conditional pipelining) procedure requires O(n2) comparisons in expectation and whp, and is on par with its deterministic counterpart. We propose the first population protocol allowing self-replication of a strand of an arbitrary length k (carrying a k-bit message of size independent of the state space) in parallel time O(n(k + log n)). The pipelining mechanism and the time complexity analysis of the strand self-replication protocol mimic those used in the probabilistic bubble-sort. The new protocol permits also simultaneous self-replication, where l copies of the strand can be created in time O(n(k + log n) log l). Finally, we discuss application of the strand self-replication protocol to pattern matching. Our protocols are always correct and provide time guaranties with high probability defined as 1 - n-η, for a constant η > 0
Towards Simpler Sorting Networks and Monotone Circuits for Majority
In this paper, we study the problem of computing the majority function by
low-depth monotone circuits and a related problem of constructing low-depth
sorting networks. We consider both the classical setting with elementary
operations of arity and the generalized setting with operations of arity
, where is a parameter. For both problems and both settings, there are
various constructions known, the minimal known depth being logarithmic.
However, there is currently no known construction that simultaneously achieves
sub-log-squared depth, effective constructability, simplicity, and has a
potential to be used in practice. In this paper we make progress towards
resolution of this problem.
For computing majority by standard monotone circuits (gates of arity 2) we
provide an explicit monotone circuit of depth . The
construction is a combination of several known and not too complicated ideas.
For arbitrary arity of gates we provide a new sorting network
architecture inspired by representation of inputs as a high-dimensional cube.
As a result we provide a simple construction that improves previous upper bound
of to . We prove the similar bound for the depth
of the circuit computing majority of bits consisting of gates computing
majority of bits. Note, that for both problems there is an explicit
construction of depth known, but the construction is complicated
and the constant hidden in -notation is huge
Tolerating Faults in Counting Networks
Counting networks were proposed by Aspnes, Herlihy and Shavit [4] as a technique
for solving multiprocessor coordination problems. We describe a method for tolerating an
arbitrary number of faults in counting networks. In our fault model, the following errors can occur
dynamically in the counting network data structure: 1) a balancer's state is spuriously altered, 2)
a balancer's state can no longer be accessed.
We propose two approaches for tolerating faults. The first is based on a construction for a
fault-tolerant balancer. We substitute a fault-tolerant balancer for every balancer in a counting
network. Thus, we transform a counting network with depth O(log to the power of 2 n); where n is the
width, into a k-fault-tolerant counting network with depth O(k log to the power of 2 n).
The second approach is to append a correction network, built with fault-tolerant balancers, to a
counting network that may experience faults. We present a bound on the error in the output token
distribution of counting networks with faulty balancers (a generalization of the error bound for
sorting networks with faulty comparators presented by Yao & Yao [21]. Given a token distribution
with a bounded error, the correction network produces a token distribution that is smooth, i.e.,
the number of tokens on each output wire differs by at most one (a weaker condition than the
step property). In order to tolerate k faults, the correction network has depth O (k to the power of 2
log n) for a network of width n
- …