36,481 research outputs found
Bad Communities with High Modularity
In this paper we discuss some problematic aspects of Newman's modularity
function QN. Given a graph G, the modularity of G can be written as QN = Qf
-Q0, where Qf is the intracluster edge fraction of G and Q0 is the expected
intracluster edge fraction of the null model, i.e., a randomly connected graph
with same expected degree distribution as G. It follows that the maximization
of QN must accomodate two factors pulling in opposite directions: Qf favors a
small number of clusters and Q0 favors many balanced (i.e., with approximately
equal degrees) clusters. In certain cases the Q0 term can cause overestimation
of the true cluster number; this is the opposite of the well-known under
estimation effect caused by the "resolution limit" of modularity. We illustrate
the overestimation effect by constructing families of graphs with a "natural"
community structure which, however, does not maximize modularity. In fact, we
prove that we can always find a graph G with a "natural clustering" V of G and
another, balanced clustering U of G such that (i) the pair (G; U) has higher
modularity than (G; V) and (ii) V and U are arbitrarily different.Comment: Significantly improved version of the paper, with the help of L.
Pitsouli
Structural graph matching using the EM algorithm and singular value decomposition
This paper describes an efficient algorithm for inexact graph matching. The method is purely structural, that is, it uses only the edge or connectivity structure of the graph and does not draw on node or edge attributes. We make two contributions: 1) commencing from a probability distribution for matching errors, we show how the problem of graph matching can be posed as maximum-likelihood estimation using the apparatus of the EM algorithm; and 2) we cast the recovery of correspondence matches between the graph nodes in a matrix framework. This allows one to efficiently recover correspondence matches using the singular value decomposition. We experiment with the method on both real-world and synthetic data. Here, we demonstrate that the method offers comparable performance to more computationally demanding method
Community detection in temporal multilayer networks, with an application to correlation networks
Networks are a convenient way to represent complex systems of interacting
entities. Many networks contain "communities" of nodes that are more densely
connected to each other than to nodes in the rest of the network. In this
paper, we investigate the detection of communities in temporal networks
represented as multilayer networks. As a focal example, we study time-dependent
financial-asset correlation networks. We first argue that the use of the
"modularity" quality function---which is defined by comparing edge weights in
an observed network to expected edge weights in a "null network"---is
application-dependent. We differentiate between "null networks" and "null
models" in our discussion of modularity maximization, and we highlight that the
same null network can correspond to different null models. We then investigate
a multilayer modularity-maximization problem to identify communities in
temporal networks. Our multilayer analysis only depends on the form of the
maximization problem and not on the specific quality function that one chooses.
We introduce a diagnostic to measure \emph{persistence} of community structure
in a multilayer network partition. We prove several results that describe how
the multilayer maximization problem measures a trade-off between static
community structure within layers and larger values of persistence across
layers. We also discuss some computational issues that the popular "Louvain"
heuristic faces with temporal multilayer networks and suggest ways to mitigate
them.Comment: 42 pages, many figures, final accepted version before typesettin
Online Multistage Subset Maximization Problems
Numerous combinatorial optimization problems (knapsack, maximum-weight matching, etc.) can be expressed as subset maximization problems: One is given a ground set N={1,...,n}, a collection F subseteq 2^N of subsets thereof such that the empty set is in F, and an objective (profit) function p: F -> R_+. The task is to choose a set S in F that maximizes p(S). We consider the multistage version (Eisenstat et al., Gupta et al., both ICALP 2014) of such problems: The profit function p_t (and possibly the set of feasible solutions F_t) may change over time. Since in many applications changing the solution is costly, the task becomes to find a sequence of solutions that optimizes the trade-off between good per-time solutions and stable solutions taking into account an additional similarity bonus. As similarity measure for two consecutive solutions, we consider either the size of the intersection of the two solutions or the difference of n and the Hamming distance between the two characteristic vectors.
We study multistage subset maximization problems in the online setting, that is, p_t (along with possibly F_t) only arrive one by one and, upon such an arrival, the online algorithm has to output the corresponding solution without knowledge of the future.
We develop general techniques for online multistage subset maximization and thereby characterize those models (given by the type of data evolution and the type of similarity measure) that admit a constant-competitive online algorithm. When no constant competitive ratio is possible, we employ lookahead to circumvent this issue. When a constant competitive ratio is possible, we provide almost matching lower and upper bounds on the best achievable one
- …