189 research outputs found
The Power of Pivoting for Exact Clique Counting
Clique counting is a fundamental task in network analysis, and even the
simplest setting of -cliques (triangles) has been the center of much recent
research. Getting the count of -cliques for larger is algorithmically
challenging, due to the exponential blowup in the search space of large
cliques. But a number of recent applications (especially for community
detection or clustering) use larger clique counts. Moreover, one often desires
\textit{local} counts, the number of -cliques per vertex/edge.
Our main result is Pivoter, an algorithm that exactly counts the number of
-cliques, \textit{for all values of }. It is surprisingly effective in
practice, and is able to get clique counts of graphs that were beyond the reach
of previous work. For example, Pivoter gets all clique counts in a social
network with a 100M edges within two hours on a commodity machine. Previous
parallel algorithms do not terminate in days. Pivoter can also feasibly get
local per-vertex and per-edge -clique counts (for all ) for many public
data sets with tens of millions of edges. To the best of our knowledge, this is
the first algorithm that achieves such results.
The main insight is the construction of a Succinct Clique Tree (SCT) that
stores a compressed unique representation of all cliques in an input graph. It
is built using a technique called \textit{pivoting}, a classic approach by
Bron-Kerbosch to reduce the recursion tree of backtracking algorithms for
maximal cliques. Remarkably, the SCT can be built without actually enumerating
all cliques, and provides a succinct data structure from which exact clique
statistics (-clique counts, local counts) can be read off efficiently.Comment: 10 pages, WSDM 202
Beyond Triangles: A Distributed Framework for Estimating 3-profiles of Large Graphs
We study the problem of approximating the -profile of a large graph.
-profiles are generalizations of triangle counts that specify the number of
times a small graph appears as an induced subgraph of a large graph. Our
algorithm uses the novel concept of -profile sparsifiers: sparse graphs that
can be used to approximate the full -profile counts for a given large graph.
Further, we study the problem of estimating local and ego -profiles, two
graph quantities that characterize the local neighborhood of each vertex of a
graph.
Our algorithm is distributed and operates as a vertex program over the
GraphLab PowerGraph framework. We introduce the concept of edge pivoting which
allows us to collect -hop information without maintaining an explicit
-hop neighborhood list at each vertex. This enables the computation of all
the local -profiles in parallel with minimal communication.
We test out implementation in several experiments scaling up to cores
on Amazon EC2. We find that our algorithm can estimate the -profile of a
graph in approximately the same time as triangle counting. For the harder
problem of ego -profiles, we introduce an algorithm that can estimate
profiles of hundreds of thousands of vertices in parallel, in the timescale of
minutes.Comment: To appear in part at KDD'1
Parallelizing Maximal Clique Enumeration on GPUs
We present a GPU solution for exact maximal clique enumeration (MCE) that
performs a search tree traversal following the Bron-Kerbosch algorithm. Prior
works on parallelizing MCE on GPUs perform a breadth-first traversal of the
tree, which has limited scalability because of the explosion in the number of
tree nodes at deep levels. We propose to parallelize MCE on GPUs by performing
depth-first traversal of independent subtrees in parallel. Since MCE suffers
from high load imbalance and memory capacity requirements, we propose a worker
list for dynamic load balancing, as well as partial induced subgraphs and a
compact representation of excluded vertex sets to regulate memory consumption.
Our evaluation shows that our GPU implementation on a single GPU outperforms
the state-of-the-art parallel CPU implementation by a geometric mean of 4.9x
(up to 16.7x), and scales efficiently to multiple GPUs. Our code has been
open-sourced to enable further research on accelerating MCE
Exact Algorithms for Maximum Clique: a computational study
We investigate a number of recently reported exact algorithms for the maximum
clique problem (MCQ, MCR, MCS, BBMC). The program code used is presented and
critiqued showing how small changes in implementation can have a drastic effect
on performance. The computational study demonstrates how problem features and
hardware platforms influence algorithm behaviour. The minimum width order
(smallest-last) is investigated, and MCS is broken into its consituent parts
and we discover that one of these parts degrades performance. It is shown that
the standard procedure used for rescaling published results is unsafe.Comment: 40 pages, 14 figures, 10 tables, 12 short java program listings, code
afailable to download at
http://www.dcs.gla.ac.uk/~pat/maxClique/distribution
Efficient and Scalable Listing of Four-Vertex Subgraph
Identifying four-vertex subgraphs has long been recognized as a fundamental technique in bioinformatics and social networks. However, listing these structures is a challenging task, especially for graphs that do not fit in RAM. To address this problem, we build a set of algorithms, models, and implementations that can handle massive graphs on commodity hardware. Our technique achieves 4 – 5 orders of magnitude speedup compared to the best prior methods on graphs with billions of edges, with external-memory operation equally efficient
GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra
We propose GraphMineSuite (GMS): the first benchmarking suite for graph
mining that facilitates evaluating and constructing high-performance graph
mining algorithms. First, GMS comes with a benchmark specification based on
extensive literature review, prescribing representative problems, algorithms,
and datasets. Second, GMS offers a carefully designed software platform for
seamless testing of different fine-grained elements of graph mining algorithms,
such as graph representations or algorithm subroutines. The platform includes
parallel implementations of more than 40 considered baselines, and it
facilitates developing complex and fast mining algorithms. High modularity is
possible by harnessing set algebra operations such as set intersection and
difference, which enables breaking complex graph mining algorithms into simple
building blocks that can be separately experimented with. GMS is supported with
a broad concurrency analysis for portability in performance insights, and a
novel performance metric to assess the throughput of graph mining algorithms,
enabling more insightful evaluation. As use cases, we harness GMS to rapidly
redesign and accelerate state-of-the-art baselines of core graph mining
problems: degeneracy reordering (by up to >2x), maximal clique listing (by up
to >9x), k-clique listing (by 1.1x), and subgraph isomorphism (by up to 2.5x),
also obtaining better theoretical performance bounds
Graph Sketching Against Adaptive Adversaries Applied to the Minimum Degree Algorithm
Motivated by the study of matrix elimination orderings in combinatorial
scientific computing, we utilize graph sketching and local sampling to give a
data structure that provides access to approximate fill degrees of a matrix
undergoing elimination in time per elimination and
query. We then study the problem of using this data structure in the minimum
degree algorithm, which is a widely-used heuristic for producing elimination
orderings for sparse matrices by repeatedly eliminating the vertex with
(approximate) minimum fill degree. This leads to a nearly-linear time algorithm
for generating approximate greedy minimum degree orderings. Despite extensive
studies of algorithms for elimination orderings in combinatorial scientific
computing, our result is the first rigorous incorporation of randomized tools
in this setting, as well as the first nearly-linear time algorithm for
producing elimination orderings with provable approximation guarantees.
While our sketching data structure readily works in the oblivious adversary
model, by repeatedly querying and greedily updating itself, it enters the
adaptive adversarial model where the underlying sketches become prone to
failure due to dependency issues with their internal randomness. We show how to
use an additional sampling procedure to circumvent this problem and to create
an independent access sequence. Our technique for decorrelating the interleaved
queries and updates to this randomized data structure may be of independent
interest.Comment: 58 pages, 3 figures. This is a substantially revised version of
arXiv:1711.08446 with an emphasis on the underlying theoretical problem
Exact and approximate route set generation for resilient partial observability in sensor location problems
Sensor positioning is a fundamental problem in transportation networks, as the location of sensors strongly determines how traffic flows are observable and hence manageable. This paper aims to develop a methodology to determine sensor locations on a network such that an optimal trade-off solution is found between the amount of sensors installed and the resilience of the sensor set. In particular, we propose exact and heuristic solutions for identifying the optimal route sets such that no other route would include any additional information for finding optimal full and partial observability solutions. This is an important contribution to sensor location problems, as route-based link flow inference problems have non-unique solutions, strongly depending on the used link-route information. The properties of the new methodology are analyzed and illustrated through different case studies, and the advantages of the algorithms are quantified both for full and for partial observability solutions. Due to the route sets found by our approach, we are able to find full observability solutions characterized by a small number of sensors, while yet being efficient also in terms of partial observability. We perform validation tests on both small and real-life sized network instances. © 2017 Elsevier Lt
- …