242 research outputs found
DPP-PMRF: Rethinking Optimization for a Probabilistic Graphical Model Using Data-Parallel Primitives
We present a new parallel algorithm for probabilistic graphical model
optimization. The algorithm relies on data-parallel primitives (DPPs), which
provide portable performance over hardware architecture. We evaluate results on
CPUs and GPUs for an image segmentation problem. Compared to a serial baseline,
we observe runtime speedups of up to 13X (CPU) and 44X (GPU). We also compare
our performance to a reference, OpenMP-based algorithm, and find speedups of up
to 7X (CPU).Comment: LDAV 2018, October 201
Shared-Memory Parallel Maximal Clique Enumeration
We present shared-memory parallel methods for Maximal Clique Enumeration
(MCE) from a graph. MCE is a fundamental and well-studied graph analytics task,
and is a widely used primitive for identifying dense structures in a graph. Due
to its computationally intensive nature, parallel methods are imperative for
dealing with large graphs. However, surprisingly, there do not yet exist
scalable and parallel methods for MCE on a shared-memory parallel machine. In
this work, we present efficient shared-memory parallel algorithms for MCE, with
the following properties: (1) the parallel algorithms are provably
work-efficient relative to a state-of-the-art sequential algorithm (2) the
algorithms have a provably small parallel depth, showing that they can scale to
a large number of processors, and (3) our implementations on a multicore
machine shows a good speedup and scaling behavior with increasing number of
cores, and are substantially faster than prior shared-memory parallel
algorithms for MCE.Comment: 10 pages, 3 figures, proceedings of the 25th IEEE International
Conference on. High Performance Computing, Data, and Analytics (HiPC), 201
Optimizing Geometry Compression using Quantum Annealing
The compression of geometry data is an important aspect of
bandwidth-efficient data transfer for distributed 3d computer vision
applications. We propose a quantum-enabled lossy 3d point cloud compression
pipeline based on the constructive solid geometry (CSG) model representation.
Key parts of the pipeline are mapped to NP-complete problems for which an
efficient Ising formulation suitable for the execution on a Quantum Annealer
exists. We describe existing Ising formulations for the maximum clique search
problem and the smallest exact cover problem, both of which are important
building blocks of the proposed compression pipeline. Additionally, we discuss
the properties of the overall pipeline regarding result optimality and
described Ising formulations.Comment: 6 pages, 3 figure
Sublinear-Space Bounded-Delay Enumeration for Massive Network Analytics: Maximal Cliques
Due to the sheer size of real-world networks, delay and space become quite relevant measures for the cost of enumeration in network analytics. This paper presents efficient algorithms for listing maximum cliques in networks, providing the first sublinear-space bounds with guaranteed delay per enumerated clique, thus comparing favorably with the known literature
Mining dense substructures from large deterministic and probabilistic graphs
Graphs represent relationships. Some relationships can be represented as a deterministic graph while others can only be represented by using probabilities. Mining dense structures from graphs help us to find useful patterns in these relationships having applications in wide areas like social network analysis, bioinformatics etc. Arguably the two most fundamental dense substructures are Maximal Cliques and Maximal Bicliques. The enumeration of both these structures are central to many data mining problems. With the advent of “big data”, real world graphs have become massive. Recently systems like MapReduce have evolved to process such large data. However using these systems to mine dense substrucures in massive graphs is an open question. In this thesis, we present novel parallel algorithms using MapReduce for the enumeration of Maximal Cliques / Bicliques in large graphs. We show that our algorithms are work optimal and load balanced. Further, we present a detailed evaluation which shows that the algorithm scales to large graphs with millions of edges and tens of millions of output structures. Finally we consider the problem of Maximal Clique Enumeration in an Uncertain Graph, which is a probability distribution on a set of deterministic graphs. We define the notion of a maximal clique for an uncertain graph, give matching upper and lower bounds on the number of such structures and present a near optimal algorithm to mine all maximal cliques
Parallelizing Maximal Clique Enumeration on GPUs
We present a GPU solution for exact maximal clique enumeration (MCE) that
performs a search tree traversal following the Bron-Kerbosch algorithm. Prior
works on parallelizing MCE on GPUs perform a breadth-first traversal of the
tree, which has limited scalability because of the explosion in the number of
tree nodes at deep levels. We propose to parallelize MCE on GPUs by performing
depth-first traversal of independent subtrees in parallel. Since MCE suffers
from high load imbalance and memory capacity requirements, we propose a worker
list for dynamic load balancing, as well as partial induced subgraphs and a
compact representation of excluded vertex sets to regulate memory consumption.
Our evaluation shows that our GPU implementation on a single GPU outperforms
the state-of-the-art parallel CPU implementation by a geometric mean of 4.9x
(up to 16.7x), and scales efficiently to multiple GPUs. Our code has been
open-sourced to enable further research on accelerating MCE
On Approximating the Number of -cliques in Sublinear Time
We study the problem of approximating the number of -cliques in a graph
when given query access to the graph.
We consider the standard query model for general graphs via (1) degree
queries, (2) neighbor queries and (3) pair queries. Let denote the number
of vertices in the graph, the number of edges, and the number of
-cliques. We design an algorithm that outputs a
-approximation (with high probability) for , whose
expected query complexity and running time are
O\left(\frac{n}{C_k^{1/k}}+\frac{m^{k/2}}{C_k}\right)\poly(\log
n,1/\varepsilon,k).
Hence, the complexity of the algorithm is sublinear in the size of the graph
for . Furthermore, we prove a lower bound showing that
the query complexity of our algorithm is essentially optimal (up to the
dependence on , and ).
The previous results in this vein are by Feige (SICOMP 06) and by Goldreich
and Ron (RSA 08) for edge counting () and by Eden et al. (FOCS 2015) for
triangle counting (). Our result matches the complexities of these
results.
The previous result by Eden et al. hinges on a certain amortization technique
that works only for triangle counting, and does not generalize for larger
cliques. We obtain a general algorithm that works for any by
designing a procedure that samples each -clique incident to a given set
of vertices with approximately equal probability. The primary difficulty is in
finding cliques incident to purely high-degree vertices, since random sampling
within neighbors has a low success probability. This is achieved by an
algorithm that samples uniform random high degree vertices and a careful
tradeoff between estimating cliques incident purely to high-degree vertices and
those that include a low-degree vertex
Scalable Kernelization for Maximum Independent Sets
The most efficient algorithms for finding maximum independent sets in both
theory and practice use reduction rules to obtain a much smaller problem
instance called a kernel. The kernel can then be solved quickly using exact or
heuristic algorithms---or by repeatedly kernelizing recursively in the
branch-and-reduce paradigm. It is of critical importance for these algorithms
that kernelization is fast and returns a small kernel. Current algorithms are
either slow but produce a small kernel, or fast and give a large kernel. We
attempt to accomplish both of these goals simultaneously, by giving an
efficient parallel kernelization algorithm based on graph partitioning and
parallel bipartite maximum matching. We combine our parallelization techniques
with two techniques to accelerate kernelization further: dependency checking
that prunes reductions that cannot be applied, and reduction tracking that
allows us to stop kernelization when reductions become less fruitful. Our
algorithm produces kernels that are orders of magnitude smaller than the
fastest kernelization methods, while having a similar execution time.
Furthermore, our algorithm is able to compute kernels with size comparable to
the smallest known kernels, but up to two orders of magnitude faster than
previously possible. Finally, we show that our kernelization algorithm can be
used to accelerate existing state-of-the-art heuristic algorithms, allowing us
to find larger independent sets faster on large real-world networks and
synthetic instances.Comment: Extended versio
GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra
We propose GraphMineSuite (GMS): the first benchmarking suite for graph
mining that facilitates evaluating and constructing high-performance graph
mining algorithms. First, GMS comes with a benchmark specification based on
extensive literature review, prescribing representative problems, algorithms,
and datasets. Second, GMS offers a carefully designed software platform for
seamless testing of different fine-grained elements of graph mining algorithms,
such as graph representations or algorithm subroutines. The platform includes
parallel implementations of more than 40 considered baselines, and it
facilitates developing complex and fast mining algorithms. High modularity is
possible by harnessing set algebra operations such as set intersection and
difference, which enables breaking complex graph mining algorithms into simple
building blocks that can be separately experimented with. GMS is supported with
a broad concurrency analysis for portability in performance insights, and a
novel performance metric to assess the throughput of graph mining algorithms,
enabling more insightful evaluation. As use cases, we harness GMS to rapidly
redesign and accelerate state-of-the-art baselines of core graph mining
problems: degeneracy reordering (by up to >2x), maximal clique listing (by up
to >9x), k-clique listing (by 1.1x), and subgraph isomorphism (by up to 2.5x),
also obtaining better theoretical performance bounds
- …