2,032 research outputs found
Building Segment Trees in Parallel
The segment tree is a simple and important data structure in computational geometry [7,11]. We present an experimental study of parallel algorithms for building segment trees. We analyze the algorithms in the context of both the PRAM (Parallel Random Access Machine) and hypercube architectures. In addition, we present performance data for implementations developed on the Connection Machine. We compare two different parallel alforitms, and we also compare our parallel algorithms to a good sequential algorithm for doing the same job. In this way, we evaluate the overall efficiency of our parallel methods. Our performance results illustrates the problems involved in using popular machine models(PRAM) and analysis techniques (asymptotic efficiency) to predict the performance of parallel algorithms on real machines. We present two different analyses of our algorithms and show that neither is effective in predicting the actual performance numbers that we obtained
Parallel RAM from Cyclic Circuits
Known simulations of random access machines (RAMs) or parallel RAMs (PRAMs)
by Boolean circuits incur significant polynomial blowup, due to the need to
repeatedly simulate accesses to a large main memory.
Consider two modifications to Boolean circuits: (1) remove the restriction
that circuit graphs are acyclic and (2) enhance AND gates such that they output
zero eagerly. If an AND gate has a zero input, it 'short circuits' and outputs
zero without waiting for its second input. We call this the cyclic circuit
model. Note, circuits in this model remain combinational, as they do not allow
wire values to change over time.
We simulate a bounded-word-size PRAM via a cyclic circuit, and the blowup
from the simulation is only polylogarithmic. Consider a PRAM program that
on a length input uses an arbitrary number of processors to manipulate
words of size bits and then halts within work. We
construct a size- cyclic circuit that simulates .
Suppose that on a particular input, halts in time ; our circuit computes
the same output within gate delay.
This implies theoretical feasibility of powerful parallel machines. Cyclic
circuits can be implemented in hardware, and our circuit achieves performance
within polylog factors of PRAM. Our simulated PRAM synchronizes processors by
simply leveraging logical dependencies between wires
Parallel Algorithm and Dynamic Exponent for Diffusion-limited Aggregation
A parallel algorithm for ``diffusion-limited aggregation'' (DLA) is described
and analyzed from the perspective of computational complexity. The dynamic
exponent z of the algorithm is defined with respect to the probabilistic
parallel random-access machine (PRAM) model of parallel computation according
to , where L is the cluster size, T is the running time, and the
algorithm uses a number of processors polynomial in L\@. It is argued that
z=D-D_2/2, where D is the fractal dimension and D_2 is the second generalized
dimension. Simulations of DLA are carried out to measure D_2 and to test
scaling assumptions employed in the complexity analysis of the parallel
algorithm. It is plausible that the parallel algorithm attains the minimum
possible value of the dynamic exponent in which case z characterizes the
intrinsic history dependence of DLA.Comment: 24 pages Revtex and 2 figures. A major improvement to the algorithm
and smaller dynamic exponent in this versio
Complexity, parallel computation and statistical physics
The intuition that a long history is required for the emergence of complexity
in natural systems is formalized using the notion of depth. The depth of a
system is defined in terms of the number of parallel computational steps needed
to simulate it. Depth provides an objective, irreducible measure of history
applicable to systems of the kind studied in statistical physics. It is argued
that physical complexity cannot occur in the absence of substantial depth and
that depth is a useful proxy for physical complexity. The ideas are illustrated
for a variety of systems in statistical physics.Comment: 21 pages, 7 figure
Parallel Weighted Random Sampling
Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps both for shared-memory and distributed-memory machines. We give efficient, fast, and practicable algorithms for sampling single items, k items with/without replacement, permutations, subsets, and reservoirs. We also give improved sequential algorithms for alias table construction and for sampling with replacement. Experiments on shared-memory parallel machines with up to 158 threads show near linear speedups both for construction and queries
Internal Diffusion-Limited Aggregation: Parallel Algorithms and Complexity
The computational complexity of internal diffusion-limited aggregation (DLA)
is examined from both a theoretical and a practical point of view. We show that
for two or more dimensions, the problem of predicting the cluster from a given
set of paths is complete for the complexity class CC, the subset of P
characterized by circuits composed of comparator gates. CC-completeness is
believed to imply that, in the worst case, growing a cluster of size n requires
polynomial time in n even on a parallel computer.
A parallel relaxation algorithm is presented that uses the fact that clusters
are nearly spherical to guess the cluster from a given set of paths, and then
corrects defects in the guessed cluster through a non-local annihilation
process. The parallel running time of the relaxation algorithm for
two-dimensional internal DLA is studied by simulating it on a serial computer.
The numerical results are compatible with a running time that is either
polylogarithmic in n or a small power of n. Thus the computational resources
needed to grow large clusters are significantly less on average than the
worst-case analysis would suggest.
For a parallel machine with k processors, we show that random clusters in d
dimensions can be generated in O((n/k + log k) n^{2/d}) steps. This is a
significant speedup over explicit sequential simulation, which takes
O(n^{1+2/d}) time on average.
Finally, we show that in one dimension internal DLA can be predicted in O(log
n) parallel time, and so is in the complexity class NC
Round Compression for Parallel Matching Algorithms
For over a decade now we have been witnessing the success of {\em massive
parallel computation} (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or
Spark. One of the reasons for their success is the fact that these frameworks
are able to accurately capture the nature of large-scale computation. In
particular, compared to the classic distributed algorithms or PRAM models,
these frameworks allow for much more local computation. The fundamental
question that arises in this context is though: can we leverage this additional
power to obtain even faster parallel algorithms?
A prominent example here is the {\em maximum matching} problem---one of the
most classic graph problems. It is well known that in the PRAM model one can
compute a 2-approximate maximum matching in rounds. However, the
exact complexity of this problem in the MPC framework is still far from
understood. Lattanzi et al. showed that if each machine has
memory, this problem can also be solved -approximately in a constant number
of rounds. These techniques, as well as the approaches developed in the follow
up work, seem though to get stuck in a fundamental way at roughly
rounds once we enter the near-linear memory regime. It is thus entirely
possible that in this regime, which captures in particular the case of sparse
graph computations, the best MPC round complexity matches what one can already
get in the PRAM model, without the need to take advantage of the extra local
computation power.
In this paper, we finally refute that perplexing possibility. That is, we
break the above round complexity bound even in the case of {\em
slightly sublinear} memory per machine. In fact, our improvement here is {\em
almost exponential}: we are able to deliver a -approximation to
maximum matching, for any fixed constant , in
rounds
- …