90 research outputs found
A Novel Approach to Finding Near-Cliques: The Triangle-Densest Subgraph Problem
Many graph mining applications rely on detecting subgraphs which are
near-cliques. There exists a dichotomy between the results in the existing work
related to this problem: on the one hand the densest subgraph problem (DSP)
which maximizes the average degree over all subgraphs is solvable in polynomial
time but for many networks fails to find subgraphs which are near-cliques. On
the other hand, formulations that are geared towards finding near-cliques are
NP-hard and frequently inapproximable due to connections with the Maximum
Clique problem.
In this work, we propose a formulation which combines the best of both
worlds: it is solvable in polynomial time and finds near-cliques when the DSP
fails. Surprisingly, our formulation is a simple variation of the DSP.
Specifically, we define the triangle densest subgraph problem (TDSP): given
, find a subset of vertices such that , where is the number of triangles induced
by the set . We provide various exact and approximation algorithms which the
solve the TDSP efficiently. Furthermore, we show how our algorithms adapt to
the more general problem of maximizing the -clique average density. Finally,
we provide empirical evidence that the TDSP should be used whenever the output
of the DSP fails to output a near-clique.Comment: 42 page
FLEET: Butterfly Estimation from a Bipartite Graph Stream
We consider space-efficient single-pass estimation of the number of
butterflies, a fundamental bipartite graph motif, from a massive bipartite
graph stream where each edge represents a connection between entities in two
different partitions. We present a space lower bound for any streaming
algorithm that can estimate the number of butterflies accurately, as well as
FLEET, a suite of algorithms for accurately estimating the number of
butterflies in the graph stream. Estimates returned by the algorithms come with
provable guarantees on the approximation error, and experiments show good
tradeoffs between the space used and the accuracy of approximation. We also
present space-efficient algorithms for estimating the number of butterflies
within a sliding window of the most recent elements in the stream. While there
is a significant body of work on counting subgraphs such as triangles in a
unipartite graph stream, our work seems to be one of the few to tackle the case
of bipartite graph streams.Comment: This is the author's version of the work. It is posted here by
permission of ACM for your personal use. Not for redistribution. The
definitive version was published in Seyed-Vahid Sanei-Mehri, Yu Zhang, Ahmet
Erdem Sariyuce and Srikanta Tirthapura. "FLEET: Butterfly Estimation from a
Bipartite Graph Stream". The 28th ACM International Conference on Information
and Knowledge Managemen
Evaluating the effects of high-throughput structural neuroimaging predictors on whole-brain functional connectome outcomes via network-based vector-on-matrix regression
The joint analysis of multimodal neuroimaging data is critical in the field
of brain research because it reveals complex interactive relationships between
neurobiological structures and functions. In this study, we focus on
investigating the effects of structural imaging (SI) features, including white
matter micro-structure integrity (WMMI) and cortical thickness, on the whole
brain functional connectome (FC) network. To achieve this goal, we propose a
network-based vector-on-matrix regression model to characterize the FC-SI
association patterns. We have developed a novel multi-level dense bipartite and
clique subgraph extraction method to identify which subsets of spatially
specific SI features intensively influence organized FC sub-networks. The
proposed method can simultaneously identify highly correlated
structural-connectomic association patterns and suppress false positive
findings while handling millions of potential interactions. We apply our method
to a multimodal neuroimaging dataset of 4,242 participants from the UK Biobank
to evaluate the effects of whole-brain WMMI and cortical thickness on the
resting-state FC. The results reveal that the WMMI on corticospinal tracts and
inferior cerebellar peduncle significantly affect functional connections of
sensorimotor, salience, and executive sub-networks with an average correlation
of 0.81 (p<0.001).Comment: 20 pages, 5 figures, 2 table
Parallel Algorithms for Hierarchical Nucleus Decomposition
Nucleus decompositions have been shown to be a useful tool for finding dense
subgraphs. The coreness value of a clique represents its density based on the
number of other cliques it is adjacent to. One useful output of nucleus
decomposition is to generate a hierarchy among dense subgraphs at different
resolutions. However, existing parallel algorithms for nucleus decomposition do
not generate this hierarchy, and only compute the coreness values. This paper
presents a scalable parallel algorithm for hierarchy construction, with
practical optimizations, such as interleaving the coreness computation with
hierarchy construction and using a concurrent union-find data structure in an
innovative way to generate the hierarchy. We also introduce a parallel
approximation algorithm for nucleus decomposition, which achieves much lower
span in theory and better performance in practice. We prove strong theoretical
bounds on the work and span (parallel time) of our algorithms.
On a 30-core machine with two-way hyper-threading on real-world graphs, our
parallel hierarchy construction algorithm achieves up to a 58.84x speedup over
the state-of-the-art sequential hierarchy construction algorithm by Sariyuce et
al. and up to a 30.96x self-relative parallel speedup. On the same machine, our
approximation algorithm achieves a 3.3x speedup over our exact algorithm, while
generating coreness estimates with a multiplicative error of 1.33x on average
On the Generalized Mean Densest Subgraph Problem: Complexity and Algorithms
Dense subgraph discovery is an important problem in graph mining and network
analysis with several applications. Two canonical problems here are to find a
maxcore (subgraph of maximum min degree) and to find a densest subgraph
(subgraph of maximum average degree). Both of these problems can be solved in
polynomial time. Veldt, Benson, and Kleinberg [VBK21] introduced the
generalized -mean densest subgraph problem which captures the maxcore
problem when and the densest subgraph problem when . They
observed that the objective leads to a supermodular function when and
hence can be solved in polynomial time; for this case, they also developed a
simple greedy peeling algorithm with a bounded approximation ratio. In this
paper, we make several contributions. First, we prove that for any the problem is NP-Hard and for any the weighted version of the problem is NP-Hard, partly
resolving a question left open in [VBK21]. Second, we describe two simple
-approximation algorithms for all , and show that our analysis of
these algorithms is tight. For we develop a fast near-linear time
implementation of the greedy peeling algorithm from [VBK21]. This allows us to
plug it into the iterative peeling algorithm that was shown to converge to an
optimum solution [CQT22]. We demonstrate the efficacy of our algorithms by
running extensive experiments on large graphs. Together, our results provide a
comprehensive understanding of the complexity of the -mean densest subgraph
problem and lead to fast and provably good algorithms for the full range of
The K-clique Densest Subgraph Problem
Numerous graph mining applications rely on detecting sub-graphs which are large near-cliques. Since formulations that are geared towards finding large near-cliques are NP-hard and frequently inapproximable due to connections with the Maximum Clique problem, the poly-time solvable densest subgraph problem which maximizes the average degree over all possible subgraphs “lies at the core of large scale data mining”[10]. However, frequently the densest subgraph prob-lem fails in detecting large near-cliques in networks. In this work, we introduce the k-clique densest subgraph problem, k ≥ 2. This generalizes the well studied dens-est subgraph problem which is obtained as a special case for k = 2. For k = 3 we obtain a novel formulation which we refer to as the triangle densest subgraph problem: given a graph G(V,E), find a subset of vertices S ∗ such that τ(S∗) = max S⊆V t(S
Faster Subgraph Counting in Sparse Graphs
A fundamental graph problem asks to compute the number of induced copies of a k-node pattern graph H in an n-node graph G. The fastest algorithm to date is still the 35-years-old algorithm by Nesetril and Poljak [Nesetril and Poljak, 1985], with running time f(k) * O(n^{omega floor[k/3] + 2}) where omega <=2.373 is the matrix multiplication exponent. In this work we show that, if one takes into account the degeneracy d of G, then the picture becomes substantially richer and leads to faster algorithms when G is sufficiently sparse. More precisely, after introducing a novel notion of graph width, the DAG-treewidth, we prove what follows. If H has DAG-treewidth tau(H) and G has degeneracy d, then the induced copies of H in G can be counted in time f(d,k) * O~(n^{tau(H)}); and, under the Exponential Time Hypothesis, no algorithm can solve the problem in time f(d,k) * n^{o(tau(H)/ln tau(H))} for all H. This result characterises the complexity of counting subgraphs in a d-degenerate graph. Developing bounds on tau(H), then, we obtain natural generalisations of classic results and faster algorithms for sparse graphs. For example, when d=O(poly log(n)) we can count the induced copies of any H in time f(k) * O~(n^{floor[k/4] + 2}), beating the Nesetril-Poljak algorithm by essentially a cubic factor in n
- …