36,481 research outputs found

    Bad Communities with High Modularity

    Full text link
    In this paper we discuss some problematic aspects of Newman's modularity function QN. Given a graph G, the modularity of G can be written as QN = Qf -Q0, where Qf is the intracluster edge fraction of G and Q0 is the expected intracluster edge fraction of the null model, i.e., a randomly connected graph with same expected degree distribution as G. It follows that the maximization of QN must accomodate two factors pulling in opposite directions: Qf favors a small number of clusters and Q0 favors many balanced (i.e., with approximately equal degrees) clusters. In certain cases the Q0 term can cause overestimation of the true cluster number; this is the opposite of the well-known under estimation effect caused by the "resolution limit" of modularity. We illustrate the overestimation effect by constructing families of graphs with a "natural" community structure which, however, does not maximize modularity. In fact, we prove that we can always find a graph G with a "natural clustering" V of G and another, balanced clustering U of G such that (i) the pair (G; U) has higher modularity than (G; V) and (ii) V and U are arbitrarily different.Comment: Significantly improved version of the paper, with the help of L. Pitsouli

    Structural graph matching using the EM algorithm and singular value decomposition

    Get PDF
    This paper describes an efficient algorithm for inexact graph matching. The method is purely structural, that is, it uses only the edge or connectivity structure of the graph and does not draw on node or edge attributes. We make two contributions: 1) commencing from a probability distribution for matching errors, we show how the problem of graph matching can be posed as maximum-likelihood estimation using the apparatus of the EM algorithm; and 2) we cast the recovery of correspondence matches between the graph nodes in a matrix framework. This allows one to efficiently recover correspondence matches using the singular value decomposition. We experiment with the method on both real-world and synthetic data. Here, we demonstrate that the method offers comparable performance to more computationally demanding method

    Community detection in temporal multilayer networks, with an application to correlation networks

    Full text link
    Networks are a convenient way to represent complex systems of interacting entities. Many networks contain "communities" of nodes that are more densely connected to each other than to nodes in the rest of the network. In this paper, we investigate the detection of communities in temporal networks represented as multilayer networks. As a focal example, we study time-dependent financial-asset correlation networks. We first argue that the use of the "modularity" quality function---which is defined by comparing edge weights in an observed network to expected edge weights in a "null network"---is application-dependent. We differentiate between "null networks" and "null models" in our discussion of modularity maximization, and we highlight that the same null network can correspond to different null models. We then investigate a multilayer modularity-maximization problem to identify communities in temporal networks. Our multilayer analysis only depends on the form of the maximization problem and not on the specific quality function that one chooses. We introduce a diagnostic to measure \emph{persistence} of community structure in a multilayer network partition. We prove several results that describe how the multilayer maximization problem measures a trade-off between static community structure within layers and larger values of persistence across layers. We also discuss some computational issues that the popular "Louvain" heuristic faces with temporal multilayer networks and suggest ways to mitigate them.Comment: 42 pages, many figures, final accepted version before typesettin

    Online Multistage Subset Maximization Problems

    Get PDF
    Numerous combinatorial optimization problems (knapsack, maximum-weight matching, etc.) can be expressed as subset maximization problems: One is given a ground set N={1,...,n}, a collection F subseteq 2^N of subsets thereof such that the empty set is in F, and an objective (profit) function p: F -> R_+. The task is to choose a set S in F that maximizes p(S). We consider the multistage version (Eisenstat et al., Gupta et al., both ICALP 2014) of such problems: The profit function p_t (and possibly the set of feasible solutions F_t) may change over time. Since in many applications changing the solution is costly, the task becomes to find a sequence of solutions that optimizes the trade-off between good per-time solutions and stable solutions taking into account an additional similarity bonus. As similarity measure for two consecutive solutions, we consider either the size of the intersection of the two solutions or the difference of n and the Hamming distance between the two characteristic vectors. We study multistage subset maximization problems in the online setting, that is, p_t (along with possibly F_t) only arrive one by one and, upon such an arrival, the online algorithm has to output the corresponding solution without knowledge of the future. We develop general techniques for online multistage subset maximization and thereby characterize those models (given by the type of data evolution and the type of similarity measure) that admit a constant-competitive online algorithm. When no constant competitive ratio is possible, we employ lookahead to circumvent this issue. When a constant competitive ratio is possible, we provide almost matching lower and upper bounds on the best achievable one
    corecore