189 research outputs found

    The Power of Pivoting for Exact Clique Counting

    Full text link
    Clique counting is a fundamental task in network analysis, and even the simplest setting of 33-cliques (triangles) has been the center of much recent research. Getting the count of kk-cliques for larger kk is algorithmically challenging, due to the exponential blowup in the search space of large cliques. But a number of recent applications (especially for community detection or clustering) use larger clique counts. Moreover, one often desires \textit{local} counts, the number of kk-cliques per vertex/edge. Our main result is Pivoter, an algorithm that exactly counts the number of kk-cliques, \textit{for all values of kk}. It is surprisingly effective in practice, and is able to get clique counts of graphs that were beyond the reach of previous work. For example, Pivoter gets all clique counts in a social network with a 100M edges within two hours on a commodity machine. Previous parallel algorithms do not terminate in days. Pivoter can also feasibly get local per-vertex and per-edge kk-clique counts (for all kk) for many public data sets with tens of millions of edges. To the best of our knowledge, this is the first algorithm that achieves such results. The main insight is the construction of a Succinct Clique Tree (SCT) that stores a compressed unique representation of all cliques in an input graph. It is built using a technique called \textit{pivoting}, a classic approach by Bron-Kerbosch to reduce the recursion tree of backtracking algorithms for maximal cliques. Remarkably, the SCT can be built without actually enumerating all cliques, and provides a succinct data structure from which exact clique statistics (kk-clique counts, local counts) can be read off efficiently.Comment: 10 pages, WSDM 202

    Beyond Triangles: A Distributed Framework for Estimating 3-profiles of Large Graphs

    Full text link
    We study the problem of approximating the 33-profile of a large graph. 33-profiles are generalizations of triangle counts that specify the number of times a small graph appears as an induced subgraph of a large graph. Our algorithm uses the novel concept of 33-profile sparsifiers: sparse graphs that can be used to approximate the full 33-profile counts for a given large graph. Further, we study the problem of estimating local and ego 33-profiles, two graph quantities that characterize the local neighborhood of each vertex of a graph. Our algorithm is distributed and operates as a vertex program over the GraphLab PowerGraph framework. We introduce the concept of edge pivoting which allows us to collect 22-hop information without maintaining an explicit 22-hop neighborhood list at each vertex. This enables the computation of all the local 33-profiles in parallel with minimal communication. We test out implementation in several experiments scaling up to 640640 cores on Amazon EC2. We find that our algorithm can estimate the 33-profile of a graph in approximately the same time as triangle counting. For the harder problem of ego 33-profiles, we introduce an algorithm that can estimate profiles of hundreds of thousands of vertices in parallel, in the timescale of minutes.Comment: To appear in part at KDD'1

    Parallelizing Maximal Clique Enumeration on GPUs

    Full text link
    We present a GPU solution for exact maximal clique enumeration (MCE) that performs a search tree traversal following the Bron-Kerbosch algorithm. Prior works on parallelizing MCE on GPUs perform a breadth-first traversal of the tree, which has limited scalability because of the explosion in the number of tree nodes at deep levels. We propose to parallelize MCE on GPUs by performing depth-first traversal of independent subtrees in parallel. Since MCE suffers from high load imbalance and memory capacity requirements, we propose a worker list for dynamic load balancing, as well as partial induced subgraphs and a compact representation of excluded vertex sets to regulate memory consumption. Our evaluation shows that our GPU implementation on a single GPU outperforms the state-of-the-art parallel CPU implementation by a geometric mean of 4.9x (up to 16.7x), and scales efficiently to multiple GPUs. Our code has been open-sourced to enable further research on accelerating MCE

    Exact Algorithms for Maximum Clique: a computational study

    Get PDF
    We investigate a number of recently reported exact algorithms for the maximum clique problem (MCQ, MCR, MCS, BBMC). The program code used is presented and critiqued showing how small changes in implementation can have a drastic effect on performance. The computational study demonstrates how problem features and hardware platforms influence algorithm behaviour. The minimum width order (smallest-last) is investigated, and MCS is broken into its consituent parts and we discover that one of these parts degrades performance. It is shown that the standard procedure used for rescaling published results is unsafe.Comment: 40 pages, 14 figures, 10 tables, 12 short java program listings, code afailable to download at http://www.dcs.gla.ac.uk/~pat/maxClique/distribution

    Efficient and Scalable Listing of Four-Vertex Subgraph

    Get PDF
    Identifying four-vertex subgraphs has long been recognized as a fundamental technique in bioinformatics and social networks. However, listing these structures is a challenging task, especially for graphs that do not fit in RAM. To address this problem, we build a set of algorithms, models, and implementations that can handle massive graphs on commodity hardware. Our technique achieves 4 – 5 orders of magnitude speedup compared to the best prior methods on graphs with billions of edges, with external-memory operation equally efficient

    GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra

    Full text link
    We propose GraphMineSuite (GMS): the first benchmarking suite for graph mining that facilitates evaluating and constructing high-performance graph mining algorithms. First, GMS comes with a benchmark specification based on extensive literature review, prescribing representative problems, algorithms, and datasets. Second, GMS offers a carefully designed software platform for seamless testing of different fine-grained elements of graph mining algorithms, such as graph representations or algorithm subroutines. The platform includes parallel implementations of more than 40 considered baselines, and it facilitates developing complex and fast mining algorithms. High modularity is possible by harnessing set algebra operations such as set intersection and difference, which enables breaking complex graph mining algorithms into simple building blocks that can be separately experimented with. GMS is supported with a broad concurrency analysis for portability in performance insights, and a novel performance metric to assess the throughput of graph mining algorithms, enabling more insightful evaluation. As use cases, we harness GMS to rapidly redesign and accelerate state-of-the-art baselines of core graph mining problems: degeneracy reordering (by up to >2x), maximal clique listing (by up to >9x), k-clique listing (by 1.1x), and subgraph isomorphism (by up to 2.5x), also obtaining better theoretical performance bounds

    Graph Sketching Against Adaptive Adversaries Applied to the Minimum Degree Algorithm

    Full text link
    Motivated by the study of matrix elimination orderings in combinatorial scientific computing, we utilize graph sketching and local sampling to give a data structure that provides access to approximate fill degrees of a matrix undergoing elimination in O(polylog(n))O(\text{polylog}(n)) time per elimination and query. We then study the problem of using this data structure in the minimum degree algorithm, which is a widely-used heuristic for producing elimination orderings for sparse matrices by repeatedly eliminating the vertex with (approximate) minimum fill degree. This leads to a nearly-linear time algorithm for generating approximate greedy minimum degree orderings. Despite extensive studies of algorithms for elimination orderings in combinatorial scientific computing, our result is the first rigorous incorporation of randomized tools in this setting, as well as the first nearly-linear time algorithm for producing elimination orderings with provable approximation guarantees. While our sketching data structure readily works in the oblivious adversary model, by repeatedly querying and greedily updating itself, it enters the adaptive adversarial model where the underlying sketches become prone to failure due to dependency issues with their internal randomness. We show how to use an additional sampling procedure to circumvent this problem and to create an independent access sequence. Our technique for decorrelating the interleaved queries and updates to this randomized data structure may be of independent interest.Comment: 58 pages, 3 figures. This is a substantially revised version of arXiv:1711.08446 with an emphasis on the underlying theoretical problem

    Exact and approximate route set generation for resilient partial observability in sensor location problems

    Get PDF
    Sensor positioning is a fundamental problem in transportation networks, as the location of sensors strongly determines how traffic flows are observable and hence manageable. This paper aims to develop a methodology to determine sensor locations on a network such that an optimal trade-off solution is found between the amount of sensors installed and the resilience of the sensor set. In particular, we propose exact and heuristic solutions for identifying the optimal route sets such that no other route would include any additional information for finding optimal full and partial observability solutions. This is an important contribution to sensor location problems, as route-based link flow inference problems have non-unique solutions, strongly depending on the used link-route information. The properties of the new methodology are analyzed and illustrated through different case studies, and the advantages of the algorithms are quantified both for full and for partial observability solutions. Due to the route sets found by our approach, we are able to find full observability solutions characterized by a small number of sensors, while yet being efficient also in terms of partial observability. We perform validation tests on both small and real-life sized network instances. © 2017 Elsevier Lt
    • …
    corecore