Search CORE

2,032 research outputs found

Building Segment Trees in Parallel

Author: Drysdale Scot
Su Peter
Publication venue: Dartmouth Digital Commons
Publication date: 22/10/1992
Field of study

The segment tree is a simple and important data structure in computational geometry [7,11]. We present an experimental study of parallel algorithms for building segment trees. We analyze the algorithms in the context of both the PRAM (Parallel Random Access Machine) and hypercube architectures. In addition, we present performance data for implementations developed on the Connection Machine. We compare two different parallel alforitms, and we also compare our parallel algorithms to a good sequential algorithm for doing the same job. In this way, we evaluate the overall efficiency of our parallel methods. Our performance results illustrates the problems involved in using popular machine models(PRAM) and analysis techniques (asymptotic efficiency) to predict the performance of parallel algorithms on real machines. We present two different analyses of our algorithms and show that neither is effective in predicting the actual performance numbers that we obtained

Dartmouth Digital Commons (Dartmouth College)

Parallel RAM from Cyclic Circuits

Author: Heath David
Publication venue
Publication date: 10/09/2023
Field of study

Known simulations of random access machines (RAMs) or parallel RAMs (PRAMs) by Boolean circuits incur significant polynomial blowup, due to the need to repeatedly simulate accesses to a large main memory. Consider two modifications to Boolean circuits: (1) remove the restriction that circuit graphs are acyclic and (2) enhance AND gates such that they output zero eagerly. If an AND gate has a zero input, it 'short circuits' and outputs zero without waiting for its second input. We call this the cyclic circuit model. Note, circuits in this model remain combinational, as they do not allow wire values to change over time. We simulate a bounded-word-size PRAM via a cyclic circuit, and the blowup from the simulation is only polylogarithmic. Consider a PRAM program

P

that on a length

n

input uses an arbitrary number of processors to manipulate words of size

\Theta(\log n)

bits and then halts within

W(n)

work. We construct a size-

O(W(n)\cdot \log^4 n)

cyclic circuit that simulates

P

. Suppose that on a particular input,

P

halts in time

T

; our circuit computes the same output within

T \cdot O(\log^3 n)

gate delay. This implies theoretical feasibility of powerful parallel machines. Cyclic circuits can be implemented in hardware, and our circuit achieves performance within polylog factors of PRAM. Our simulated PRAM synchronizes processors by simply leveraging logical dependencies between wires

arXiv.org e-Print Archive

Parallel Algorithm and Dynamic Exponent for Diffusion-limited Aggregation

Author: A. Gibbons
C. Amitrano
C. Amitrano
C. H. Bennett
C. H. Bennett
C. H. Papadimitriou
H. Kaufman
J. Machta
J. Machta
J. Machta
P. Ossadnik
P. Ossadnik
R. C. Ball
R. F. Voss
R. F. Voss
R. Greenlaw
R. J. Anderson
S. Tolman
T. A. Witten
T. C. Halsey
T. Vicsek
Publication venue: 'American Physical Society (APS)'
Publication date: 17/12/1996
Field of study

A parallel algorithm for ``diffusion-limited aggregation'' (DLA) is described and analyzed from the perspective of computational complexity. The dynamic exponent z of the algorithm is defined with respect to the probabilistic parallel random-access machine (PRAM) model of parallel computation according to

T \sim L^{z}

, where L is the cluster size, T is the running time, and the algorithm uses a number of processors polynomial in L\@. It is argued that z=D-D_2/2, where D is the fractal dimension and D_2 is the second generalized dimension. Simulations of DLA are carried out to measure D_2 and to test scaling assumptions employed in the complexity analysis of the parallel algorithm. It is plausible that the parallel algorithm attains the minimum possible value of the dynamic exponent in which case z characterizes the intrinsic history dependence of DLA.Comment: 24 pages Revtex and 2 figures. A major improvement to the algorithm and smaller dynamic exponent in this versio

arXiv.org e-Print Archive

Crossref

Complexity, parallel computation and statistical physics

Author: Machta J.
Publication venue
Publication date: 29/10/2005
Field of study

The intuition that a long history is required for the emergence of complexity in natural systems is formalized using the notion of depth. The depth of a system is defined in terms of the number of parallel computational steps needed to simulate it. Depth provides an objective, irreducible measure of history applicable to systems of the kind studied in statistical physics. It is argued that physical complexity cannot occur in the absence of substantial depth and that depth is a useful proxy for physical complexity. The ideas are illustrated for a variety of systems in statistical physics.Comment: 21 pages, 7 figure

arXiv.org e-Print Archive

ScholarWorks@UMass Amherst

Parallel Weighted Random Sampling

Author: Sanders Peter
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual European Symposium on Algorithms (ESA 2019)
Publication date: 01/01/2019
Field of study

Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps both for shared-memory and distributed-memory machines. We give efficient, fast, and practicable algorithms for sampling single items, k items with/without replacement, permutations, subsets, and reservoirs. We also give improved sequential algorithms for alias table construction and for sampling with replacement. Experiments on shared-memory parallel machines with up to 158 threads show near linear speedups both for construction and queries

Dagstuhl Research Online Publication Server

Internal Diffusion-Limited Aggregation: Parallel Algorithms and Complexity

Author: Machta Jonathan
Moore Cristopher
Publication venue
Publication date: 15/09/1999
Field of study

The computational complexity of internal diffusion-limited aggregation (DLA) is examined from both a theoretical and a practical point of view. We show that for two or more dimensions, the problem of predicting the cluster from a given set of paths is complete for the complexity class CC, the subset of P characterized by circuits composed of comparator gates. CC-completeness is believed to imply that, in the worst case, growing a cluster of size n requires polynomial time in n even on a parallel computer. A parallel relaxation algorithm is presented that uses the fact that clusters are nearly spherical to guess the cluster from a given set of paths, and then corrects defects in the guessed cluster through a non-local annihilation process. The parallel running time of the relaxation algorithm for two-dimensional internal DLA is studied by simulating it on a serial computer. The numerical results are compatible with a running time that is either polylogarithmic in n or a small power of n. Thus the computational resources needed to grow large clusters are significantly less on average than the worst-case analysis would suggest. For a parallel machine with k processors, we show that random clusters in d dimensions can be generated in O((n/k + log k) n^{2/d}) steps. This is a significant speedup over explicit sequential simulation, which takes O(n^{1+2/d}) time on average. Finally, we show that in one dimension internal DLA can be predicted in O(log n) parallel time, and so is in the complexity class NC

arXiv.org e-Print Archive

ScholarWorks@UMass Amherst

Round Compression for Parallel Matching Algorithms

Author: Czumaj Artur
Mitrović Slobodan
Mądry Aleksander
Onak Krzysztof
Sankowski Piotr
Łącki Jakub
Publication venue
Publication date: 01/01/2018
Field of study

For over a decade now we have been witnessing the success of {\em massive parallel computation} (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or Spark. One of the reasons for their success is the fact that these frameworks are able to accurately capture the nature of large-scale computation. In particular, compared to the classic distributed algorithms or PRAM models, these frameworks allow for much more local computation. The fundamental question that arises in this context is though: can we leverage this additional power to obtain even faster parallel algorithms? A prominent example here is the {\em maximum matching} problem---one of the most classic graph problems. It is well known that in the PRAM model one can compute a 2-approximate maximum matching in

O(\log{n})

rounds. However, the exact complexity of this problem in the MPC framework is still far from understood. Lattanzi et al. showed that if each machine has

n^{1+\Omega(1)}

memory, this problem can also be solved

2

-approximately in a constant number of rounds. These techniques, as well as the approaches developed in the follow up work, seem though to get stuck in a fundamental way at roughly

O(\log{n})

rounds once we enter the near-linear memory regime. It is thus entirely possible that in this regime, which captures in particular the case of sparse graph computations, the best MPC round complexity matches what one can already get in the PRAM model, without the need to take advantage of the extra local computation power. In this paper, we finally refute that perplexing possibility. That is, we break the above

O(\log n)

round complexity bound even in the case of {\em slightly sublinear} memory per machine. In fact, our improvement here is {\em almost exponential}: we are able to deliver a

(2+\epsilon)

-approximation to maximum matching, for any fixed constant

\epsilon>0

, in

O((\log \log n)^2)

rounds

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

DSpace@MIT

Warwick Research Archives Portal Repository