5,897 research outputs found
Parallel Peeling Algorithms
The analysis of several algorithms and data structures can be framed as a
peeling process on a random hypergraph: vertices with degree less than k are
removed until there are no vertices of degree less than k left. The remaining
hypergraph is known as the k-core. In this paper, we analyze parallel peeling
processes, where in each round, all vertices of degree less than k are removed.
It is known that, below a specific edge density threshold, the k-core is empty
with high probability. We show that, with high probability, below this
threshold, only (log log n)/log(k-1)(r-1) + O(1) rounds of peeling are needed
to obtain the empty k-core for r-uniform hypergraphs. Interestingly, we show
that above this threshold, Omega(log n) rounds of peeling are required to find
the non-empty k-core. Since most algorithms and data structures aim to peel to
an empty k-core, this asymmetry appears fortunate. We verify the theoretical
results both with simulation and with a parallel implementation using graphics
processing units (GPUs). Our implementation provides insights into how to
structure parallel peeling algorithms for efficiency in practice.Comment: Appears in SPAA 2014. Minor typo corrections relative to previous
versio
Unsupervised Bump Hunting Using Principal Components
Principal Components Analysis is a widely used technique for dimension
reduction and characterization of variability in multivariate populations. Our
interest lies in studying when and why the rotation to principal components can
be used effectively within a response-predictor set relationship in the context
of mode hunting. Specifically focusing on the Patient Rule Induction Method
(PRIM), we first develop a fast version of this algorithm (fastPRIM) under
normality which facilitates the theoretical studies to follow. Using basic
geometrical arguments, we then demonstrate how the PC rotation of the predictor
space alone can in fact generate improved mode estimators. Simulation results
are used to illustrate our findings.Comment: 24 pages, 9 figure
Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable
There has been significant recent interest in parallel graph processing due
to the need to quickly analyze the large graphs available today. Many graph
codes have been designed for distributed memory or external memory. However,
today even the largest publicly-available real-world graph (the Hyperlink Web
graph with over 3.5 billion vertices and 128 billion edges) can fit in the
memory of a single commodity multicore server. Nevertheless, most experimental
work in the literature report results on much smaller graphs, and the ones for
the Hyperlink graph use distributed or external memory. Therefore, it is
natural to ask whether we can efficiently solve a broad class of graph problems
on this graph in memory.
This paper shows that theoretically-efficient parallel graph algorithms can
scale to the largest publicly-available graphs using a single machine with a
terabyte of RAM, processing them in minutes. We give implementations of
theoretically-efficient parallel algorithms for 20 important graph problems. We
also present the optimizations and techniques that we used in our
implementations, which were crucial in enabling us to process these large
graphs quickly. We show that the running times of our implementations
outperform existing state-of-the-art implementations on the largest real-world
graphs. For many of the problems that we consider, this is the first time they
have been solved on graphs at this scale. We have made the implementations
developed in this work publicly-available as the Graph-Based Benchmark Suite
(GBBS).Comment: This is the full version of the paper appearing in the ACM Symposium
on Parallelism in Algorithms and Architectures (SPAA), 201
Sympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic Analysis
Sympiler is a domain-specific code generator that optimizes sparse matrix
computations by decoupling the symbolic analysis phase from the numerical
manipulation stage in sparse codes. The computation patterns in sparse
numerical methods are guided by the input sparsity structure and the sparse
algorithm itself. In many real-world simulations, the sparsity pattern changes
little or not at all. Sympiler takes advantage of these properties to
symbolically analyze sparse codes at compile-time and to apply inspector-guided
transformations that enable applying low-level transformations to sparse codes.
As a result, the Sympiler-generated code outperforms highly-optimized matrix
factorization codes from commonly-used specialized libraries, obtaining average
speedups over Eigen and CHOLMOD of 3.8X and 1.5X respectively.Comment: 12 page
- …