25 research outputs found
Listing k-cliques in Sparse Real-World Graphs
International audienceMotivated by recent studies in the data mining community which require to efficiently list all k-cliques, we revisit the iconic algorithm of Chiba and Nishizeki and develop the most efficient parallel algorithm for such a problem. Our theoretical analysis provides the best asymptotic upper bound on the running time of our algorithm for the case when the input graph is sparse. Our experimental evaluation on large real-world graphs shows that our parallel algorithm is faster than state-of-the-art algorithms, while boasting an excellent degree of parallelism. In particular, we are able to list all k-cliques (for any k) in graphs containing up to tens of millions of edges as well as all 10-cliques in graphs containing billions of edges, within a few minutes and a few hours respectively. Finally, we show how our algorithm can be employed as an effective subroutine for finding the k-clique core decomposition and an approximate k-clique densest subgraphs in very large real-world graphs
EviDense: a Graph-based Method for Finding Unique High-impact Events with Succinct Keyword-based Descriptions
International audienceDespite the significant efforts made by the research community in recent years, automatically acquiring valuable information about high impact-events from social media remains challenging. We present EVIDENSE, a graph-based approach for finding high-impact events (such as disaster events) in social media. Our evaluation shows that our method outper-forms state-of-the-art approaches for the same problem, in terms of having higher precision, lower number of duplicates, while providing a keyword-based description that is succinct and informative
Retrieving Top-N Weighted Spatial k-cliques
Spatial data analysis is a classic yet important topic because of its wide range of applications. Recently, as a spatial data analysis approach, a neighbor graph of a set P of spatial points has often been employed. This paper also considers a spatial neighbor graph and addresses a new problem, namely top-N weighted spatial k-clique retrieval. This problem searches for the N minimum weighted cliques consisting of k points in P, and it has important applications, such as community detection and co-location pattern mining. Recent spatial datasets have many points, and efficiently dealing with such big datasets is one of the main requirements of applications. A straightforward approach to solving our problem is to try to enumerate all k-cliques, which incurs O(nkk2) time. Since k ℠3, this approach cannot achieve the main requirement, so computing the result without enumerating unnecessary k-cliques is required. This paper achieves this challenging task and proposes a simple practically-efficient algorithm that returns the exact answer. We conduct experiments using two real spatial datasets consisting of million points, and the results show the efficiency of our algorithm, e.g., it can return the exact top-N result within 1 second when N †1000 and k †7.Taniguchi R., Amagata D., Hara T.. Retrieving Top-N Weighted Spatial k-cliques. Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022 , 4952 (2022); https://doi.org/10.1109/BigData55660.2022.10021071
The Power of Pivoting for Exact Clique Counting
Clique counting is a fundamental task in network analysis, and even the
simplest setting of -cliques (triangles) has been the center of much recent
research. Getting the count of -cliques for larger is algorithmically
challenging, due to the exponential blowup in the search space of large
cliques. But a number of recent applications (especially for community
detection or clustering) use larger clique counts. Moreover, one often desires
\textit{local} counts, the number of -cliques per vertex/edge.
Our main result is Pivoter, an algorithm that exactly counts the number of
-cliques, \textit{for all values of }. It is surprisingly effective in
practice, and is able to get clique counts of graphs that were beyond the reach
of previous work. For example, Pivoter gets all clique counts in a social
network with a 100M edges within two hours on a commodity machine. Previous
parallel algorithms do not terminate in days. Pivoter can also feasibly get
local per-vertex and per-edge -clique counts (for all ) for many public
data sets with tens of millions of edges. To the best of our knowledge, this is
the first algorithm that achieves such results.
The main insight is the construction of a Succinct Clique Tree (SCT) that
stores a compressed unique representation of all cliques in an input graph. It
is built using a technique called \textit{pivoting}, a classic approach by
Bron-Kerbosch to reduce the recursion tree of backtracking algorithms for
maximal cliques. Remarkably, the SCT can be built without actually enumerating
all cliques, and provides a succinct data structure from which exact clique
statistics (-clique counts, local counts) can be read off efficiently.Comment: 10 pages, WSDM 202
Provably and Efficiently Approximating Near-cliques using the Tur\'an Shadow: PEANUTS
Clique and near-clique counts are important graph properties with
applications in graph generation, graph modeling, graph analytics, community
detection among others. They are the archetypal examples of dense subgraphs.
While there are several different definitions of near-cliques, most of them
share the attribute that they are cliques that are missing a small number of
edges. Clique counting is itself considered a challenging problem. Counting
near-cliques is significantly harder more so since the search space for
near-cliques is orders of magnitude larger than that of cliques.
We give a formulation of a near-clique as a clique that is missing a constant
number of edges. We exploit the fact that a near-clique contains a smaller
clique, and use techniques for clique sampling to count near-cliques. This
method allows us to count near-cliques with 1 or 2 missing edges, in graphs
with tens of millions of edges. To the best of our knowledge, there was no
known efficient method for this problem, and we obtain a 10x - 100x speedup
over existing algorithms for counting near-cliques.
Our main technique is a space-efficient adaptation of the Tur\'an Shadow
sampling approach, recently introduced by Jain and Seshadhri (WWW 2017). This
approach constructs a large recursion tree (called the Tur\'an Shadow) that
represents cliques in a graph. We design a novel algorithm that builds an
estimator for near-cliques, using an online, compact construction of the
Tur\'an Shadow.Comment: The Web Conference, 2020 (WWW
Parallelizing Maximal Clique Enumeration on GPUs
We present a GPU solution for exact maximal clique enumeration (MCE) that
performs a search tree traversal following the Bron-Kerbosch algorithm. Prior
works on parallelizing MCE on GPUs perform a breadth-first traversal of the
tree, which has limited scalability because of the explosion in the number of
tree nodes at deep levels. We propose to parallelize MCE on GPUs by performing
depth-first traversal of independent subtrees in parallel. Since MCE suffers
from high load imbalance and memory capacity requirements, we propose a worker
list for dynamic load balancing, as well as partial induced subgraphs and a
compact representation of excluded vertex sets to regulate memory consumption.
Our evaluation shows that our GPU implementation on a single GPU outperforms
the state-of-the-art parallel CPU implementation by a geometric mean of 4.9x
(up to 16.7x), and scales efficiently to multiple GPUs. Our code has been
open-sourced to enable further research on accelerating MCE