803,194 research outputs found
Main Memory Adaptive Indexing for Multi-core Systems
Adaptive indexing is a concept that considers index creation in databases as
a by-product of query processing; as opposed to traditional full index creation
where the indexing effort is performed up front before answering any queries.
Adaptive indexing has received a considerable amount of attention, and several
algorithms have been proposed over the past few years; including a recent
experimental study comparing a large number of existing methods. Until now,
however, most adaptive indexing algorithms have been designed single-threaded,
yet with multi-core systems already well established, the idea of designing
parallel algorithms for adaptive indexing is very natural. In this regard only
one parallel algorithm for adaptive indexing has recently appeared in the
literature: The parallel version of standard cracking. In this paper we
describe three alternative parallel algorithms for adaptive indexing, including
a second variant of a parallel standard cracking algorithm. Additionally, we
describe a hybrid parallel sorting algorithm, and a NUMA-aware method based on
sorting. We then thoroughly compare all these algorithms experimentally; along
a variant of a recently published parallel version of radix sort. Parallel
sorting algorithms serve as a realistic baseline for multi-threaded adaptive
indexing techniques. In total we experimentally compare seven parallel
algorithms. Additionally, we extensively profile all considered algorithms. The
initial set of experiments considered in this paper indicates that our parallel
algorithms significantly improve over previously known ones. Our results
suggest that, although adaptive indexing algorithms are a good design choice in
single-threaded environments, the rules change considerably in the parallel
case. That is, in future highly-parallel environments, sorting algorithms could
be serious alternatives to adaptive indexing.Comment: 26 pages, 7 figure
Round Compression for Parallel Matching Algorithms
For over a decade now we have been witnessing the success of {\em massive
parallel computation} (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or
Spark. One of the reasons for their success is the fact that these frameworks
are able to accurately capture the nature of large-scale computation. In
particular, compared to the classic distributed algorithms or PRAM models,
these frameworks allow for much more local computation. The fundamental
question that arises in this context is though: can we leverage this additional
power to obtain even faster parallel algorithms?
A prominent example here is the {\em maximum matching} problem---one of the
most classic graph problems. It is well known that in the PRAM model one can
compute a 2-approximate maximum matching in rounds. However, the
exact complexity of this problem in the MPC framework is still far from
understood. Lattanzi et al. showed that if each machine has
memory, this problem can also be solved -approximately in a constant number
of rounds. These techniques, as well as the approaches developed in the follow
up work, seem though to get stuck in a fundamental way at roughly
rounds once we enter the near-linear memory regime. It is thus entirely
possible that in this regime, which captures in particular the case of sparse
graph computations, the best MPC round complexity matches what one can already
get in the PRAM model, without the need to take advantage of the extra local
computation power.
In this paper, we finally refute that perplexing possibility. That is, we
break the above round complexity bound even in the case of {\em
slightly sublinear} memory per machine. In fact, our improvement here is {\em
almost exponential}: we are able to deliver a -approximation to
maximum matching, for any fixed constant , in
rounds
Totally parallel multilevel algorithms
Four totally parallel algorithms for the solution of a sparse linear system have common characteristics which become quite apparent when they are implemented on a highly parallel hypercube such as the CM2. These four algorithms are Parallel Superconvergent Multigrid (PSMG) of Frederickson and McBryan, Robust Multigrid (RMG) of Hackbusch, the FFT based Spectral Algorithm, and Parallel Cyclic Reduction. In fact, all four can be formulated as particular cases of the same totally parallel multilevel algorithm, which are referred to as TPMA. In certain cases the spectral radius of TPMA is zero, and it is recognized to be a direct algorithm. In many other cases the spectral radius, although not zero, is small enough that a single iteration per timestep keeps the local error within the required tolerance
Parallel Algorithms for Geometric Graph Problems
We give algorithms for geometric graph problems in the modern parallel models
inspired by MapReduce. For example, for the Minimum Spanning Tree (MST) problem
over a set of points in the two-dimensional space, our algorithm computes a
-approximate MST. Our algorithms work in a constant number of
rounds of communication, while using total space and communication proportional
to the size of the data (linear space and near linear time algorithms). In
contrast, for general graphs, achieving the same result for MST (or even
connectivity) remains a challenging open problem, despite drawing significant
attention in recent years.
We develop a general algorithmic framework that, besides MST, also applies to
Earth-Mover Distance (EMD) and the transportation cost problem. Our algorithmic
framework has implications beyond the MapReduce model. For example it yields a
new algorithm for computing EMD cost in the plane in near-linear time,
. We note that while recently Sharathkumar and Agarwal
developed a near-linear time algorithm for -approximating EMD,
our algorithm is fundamentally different, and, for example, also solves the
transportation (cost) problem, raised as an open question in their work.
Furthermore, our algorithm immediately gives a -approximation
algorithm with space in the streaming-with-sorting model with
passes. As such, it is tempting to conjecture that the
parallel models may also constitute a concrete playground in the quest for
efficient algorithms for EMD (and other similar problems) in the vanilla
streaming model, a well-known open problem
Parallel Peeling Algorithms
The analysis of several algorithms and data structures can be framed as a
peeling process on a random hypergraph: vertices with degree less than k are
removed until there are no vertices of degree less than k left. The remaining
hypergraph is known as the k-core. In this paper, we analyze parallel peeling
processes, where in each round, all vertices of degree less than k are removed.
It is known that, below a specific edge density threshold, the k-core is empty
with high probability. We show that, with high probability, below this
threshold, only (log log n)/log(k-1)(r-1) + O(1) rounds of peeling are needed
to obtain the empty k-core for r-uniform hypergraphs. Interestingly, we show
that above this threshold, Omega(log n) rounds of peeling are required to find
the non-empty k-core. Since most algorithms and data structures aim to peel to
an empty k-core, this asymmetry appears fortunate. We verify the theoretical
results both with simulation and with a parallel implementation using graphics
processing units (GPUs). Our implementation provides insights into how to
structure parallel peeling algorithms for efficiency in practice.Comment: Appears in SPAA 2014. Minor typo corrections relative to previous
versio
- …
