4,534 research outputs found
Generating constrained random graphs using multiple edge switches
The generation of random graphs using edge swaps provides a reliable method
to draw uniformly random samples of sets of graphs respecting some simple
constraints, e.g. degree distributions. However, in general, it is not
necessarily possible to access all graphs obeying some given con- straints
through a classical switching procedure calling on pairs of edges. We therefore
propose to get round this issue by generalizing this classical approach through
the use of higher-order edge switches. This method, which we denote by "k-edge
switching", makes it possible to progres- sively improve the covered portion of
a set of constrained graphs, thereby providing an increasing, asymptotically
certain confidence on the statistical representativeness of the obtained
sample.Comment: 15 page
Motif Clustering and Overlapping Clustering for Social Network Analysis
Motivated by applications in social network community analysis, we introduce
a new clustering paradigm termed motif clustering. Unlike classical clustering,
motif clustering aims to minimize the number of clustering errors associated
with both edges and certain higher order graph structures (motifs) that
represent "atomic units" of social organizations. Our contributions are
two-fold: We first introduce motif correlation clustering, in which the goal is
to agnostically partition the vertices of a weighted complete graph so that
certain predetermined "important" social subgraphs mostly lie within the same
cluster, while "less relevant" social subgraphs are allowed to lie across
clusters. We then proceed to introduce the notion of motif covers, in which the
goal is to cover the vertices of motifs via the smallest number of (near)
cliques in the graph. Motif cover algorithms provide a natural solution for
overlapping clustering and they also play an important role in latent feature
inference of networks. For both motif correlation clustering and its extension
introduced via the covering problem, we provide hardness results, algorithmic
solutions and community detection results for two well-studied social networks
Sampling motif-constrained ensembles of networks
The statistical significance of network properties is conditioned on null
models which satisfy spec- ified properties but that are otherwise random.
Exponential random graph models are a principled theoretical framework to
generate such constrained ensembles, but which often fail in practice, either
due to model inconsistency, or due to the impossibility to sample networks from
them. These problems affect the important case of networks with prescribed
clustering coefficient or number of small connected subgraphs (motifs). In this
paper we use the Wang-Landau method to obtain a multicanonical sampling that
overcomes both these problems. We sample, in polynomial time, net- works with
arbitrary degree sequences from ensembles with imposed motifs counts. Applying
this method to social networks, we investigate the relation between
transitivity and homophily, and we quantify the correlation between different
types of motifs, finding that single motifs can explain up to 60% of the
variation of motif profiles.Comment: Updated version, as published in the journal. 7 pages, 5 figures, one
Supplemental Materia
Some results on more flexible versions of Graph Motif
The problems studied in this paper originate from Graph Motif, a problem
introduced in 2006 in the context of biological networks. Informally speaking,
it consists in deciding if a multiset of colors occurs in a connected subgraph
of a vertex-colored graph. Due to the high rate of noise in the biological
data, more flexible definitions of the problem have been outlined. We present
in this paper two inapproximability results for two different optimization
variants of Graph Motif: one where the size of the solution is maximized, the
other when the number of substitutions of colors to obtain the motif from the
solution is minimized. We also study a decision version of Graph Motif where
the connectivity constraint is replaced by the well known notion of graph
modularity. While the problem remains NP-complete, it allows algorithms in FPT
for biologically relevant parameterizations
A Novel Approach to Finding Near-Cliques: The Triangle-Densest Subgraph Problem
Many graph mining applications rely on detecting subgraphs which are
near-cliques. There exists a dichotomy between the results in the existing work
related to this problem: on the one hand the densest subgraph problem (DSP)
which maximizes the average degree over all subgraphs is solvable in polynomial
time but for many networks fails to find subgraphs which are near-cliques. On
the other hand, formulations that are geared towards finding near-cliques are
NP-hard and frequently inapproximable due to connections with the Maximum
Clique problem.
In this work, we propose a formulation which combines the best of both
worlds: it is solvable in polynomial time and finds near-cliques when the DSP
fails. Surprisingly, our formulation is a simple variation of the DSP.
Specifically, we define the triangle densest subgraph problem (TDSP): given
, find a subset of vertices such that , where is the number of triangles induced
by the set . We provide various exact and approximation algorithms which the
solve the TDSP efficiently. Furthermore, we show how our algorithms adapt to
the more general problem of maximizing the -clique average density. Finally,
we provide empirical evidence that the TDSP should be used whenever the output
of the DSP fails to output a near-clique.Comment: 42 page
- …