188,149 research outputs found
Towards Work-Efficient Parallel Parameterized Algorithms
Parallel parameterized complexity theory studies how fixed-parameter
tractable (fpt) problems can be solved in parallel. Previous theoretical work
focused on parallel algorithms that are very fast in principle, but did not
take into account that when we only have a small number of processors (between
2 and, say, 1024), it is more important that the parallel algorithms are
work-efficient. In the present paper we investigate how work-efficient fpt
algorithms can be designed. We review standard methods from fpt theory, like
kernelization, search trees, and interleaving, and prove trade-offs for them
between work efficiency and runtime improvements. This results in a toolbox for
developing work-efficient parallel fpt algorithms.Comment: Prior full version of the paper that will appear in Proceedings of
the 13th International Conference and Workshops on Algorithms and Computation
(WALCOM 2019), February 27 - March 02, 2019, Guwahati, India. The final
authenticated version is available online at
https://doi.org/10.1007/978-3-030-10564-8_2
The Parallel Complexity of Growth Models
This paper investigates the parallel complexity of several non-equilibrium
growth models. Invasion percolation, Eden growth, ballistic deposition and
solid-on-solid growth are all seemingly highly sequential processes that yield
self-similar or self-affine random clusters. Nonetheless, we present fast
parallel randomized algorithms for generating these clusters. The running times
of the algorithms scale as , where is the system size, and the
number of processors required scale as a polynomial in . The algorithms are
based on fast parallel procedures for finding minimum weight paths; they
illuminate the close connection between growth models and self-avoiding paths
in random environments. In addition to their potential practical value, our
algorithms serve to classify these growth models as less complex than other
growth models, such as diffusion-limited aggregation, for which fast parallel
algorithms probably do not exist.Comment: 20 pages, latex, submitted to J. Stat. Phys., UNH-TR94-0
Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable
There has been significant recent interest in parallel graph processing due
to the need to quickly analyze the large graphs available today. Many graph
codes have been designed for distributed memory or external memory. However,
today even the largest publicly-available real-world graph (the Hyperlink Web
graph with over 3.5 billion vertices and 128 billion edges) can fit in the
memory of a single commodity multicore server. Nevertheless, most experimental
work in the literature report results on much smaller graphs, and the ones for
the Hyperlink graph use distributed or external memory. Therefore, it is
natural to ask whether we can efficiently solve a broad class of graph problems
on this graph in memory.
This paper shows that theoretically-efficient parallel graph algorithms can
scale to the largest publicly-available graphs using a single machine with a
terabyte of RAM, processing them in minutes. We give implementations of
theoretically-efficient parallel algorithms for 20 important graph problems. We
also present the optimizations and techniques that we used in our
implementations, which were crucial in enabling us to process these large
graphs quickly. We show that the running times of our implementations
outperform existing state-of-the-art implementations on the largest real-world
graphs. For many of the problems that we consider, this is the first time they
have been solved on graphs at this scale. We have made the implementations
developed in this work publicly-available as the Graph-Based Benchmark Suite
(GBBS).Comment: This is the full version of the paper appearing in the ACM Symposium
on Parallelism in Algorithms and Architectures (SPAA), 201
On Characterizing the Data Movement Complexity of Computational DAGs for Parallel Execution
Technology trends are making the cost of data movement increasingly dominant,
both in terms of energy and time, over the cost of performing arithmetic
operations in computer systems. The fundamental ratio of aggregate data
movement bandwidth to the total computational power (also referred to the
machine balance parameter) in parallel computer systems is decreasing. It is
there- fore of considerable importance to characterize the inherent data
movement requirements of parallel algorithms, so that the minimal architectural
balance parameters required to support it on future systems can be well
understood. In this paper, we develop an extension of the well-known red-blue
pebble game to develop lower bounds on the data movement complexity for the
parallel execution of computational directed acyclic graphs (CDAGs) on parallel
systems. We model multi-node multi-core parallel systems, with the total
physical memory distributed across the nodes (that are connected through some
interconnection network) and in a multi-level shared cache hierarchy for
processors within a node. We also develop new techniques for lower bound
characterization of non-homogeneous CDAGs. We demonstrate the use of the
methodology by analyzing the CDAGs of several numerical algorithms, to develop
lower bounds on data movement for their parallel execution
- …