634 research outputs found

    Combinatorial algorithm for counting small induced graphs and orbits

    Full text link
    Graphlet analysis is an approach to network analysis that is particularly popular in bioinformatics. We show how to set up a system of linear equations that relate the orbit counts and can be used in an algorithm that is significantly faster than the existing approaches based on direct enumeration of graphlets. The algorithm requires existence of a vertex with certain properties; we show that such vertex exists for graphlets of arbitrary size, except for complete graphs and C4C_4, which are treated separately. Empirical analysis of running time agrees with the theoretical results

    Algebraic Methods in the Congested Clique

    Full text link
    In this work, we use algebraic methods for studying distance computation and subgraph detection tasks in the congested clique model. Specifically, we adapt parallel matrix multiplication implementations to the congested clique, obtaining an O(n12/ω)O(n^{1-2/\omega}) round matrix multiplication algorithm, where ω<2.3728639\omega < 2.3728639 is the exponent of matrix multiplication. In conjunction with known techniques from centralised algorithmics, this gives significant improvements over previous best upper bounds in the congested clique model. The highlight results include: -- triangle and 4-cycle counting in O(n0.158)O(n^{0.158}) rounds, improving upon the O(n1/3)O(n^{1/3}) triangle detection algorithm of Dolev et al. [DISC 2012], -- a (1+o(1))(1 + o(1))-approximation of all-pairs shortest paths in O(n0.158)O(n^{0.158}) rounds, improving upon the O~(n1/2)\tilde{O} (n^{1/2})-round (2+o(1))(2 + o(1))-approximation algorithm of Nanongkai [STOC 2014], and -- computing the girth in O(n0.158)O(n^{0.158}) rounds, which is the first non-trivial solution in this model. In addition, we present a novel constant-round combinatorial algorithm for detecting 4-cycles.Comment: This is work is a merger of arxiv:1412.2109 and arxiv:1412.266

    Linear Time Subgraph Counting, Graph Degeneracy, and the Chasm at Size Six

    Get PDF
    We consider the problem of counting all k-vertex subgraphs in an input graph, for any constant k. This problem (denoted SUB-CNT_k) has been studied extensively in both theory and practice. In a classic result, Chiba and Nishizeki (SICOMP 85) gave linear time algorithms for clique and 4-cycle counting for bounded degeneracy graphs. This is a rich class of sparse graphs that contains, for example, all minor-free families and preferential attachment graphs. The techniques from this result have inspired a number of recent practical algorithms for SUB-CNT_k. Towards a better understanding of the limits of these techniques, we ask: for what values of k can SUB_CNT_k be solved in linear time? We discover a chasm at k=6. Specifically, we prove that for k < 6, SUB_CNT_k can be solved in linear time. Assuming a standard conjecture in fine-grained complexity, we prove that for all k ? 6, SUB-CNT_k cannot be solved even in near-linear time

    Beyond Triangles: A Distributed Framework for Estimating 3-profiles of Large Graphs

    Full text link
    We study the problem of approximating the 33-profile of a large graph. 33-profiles are generalizations of triangle counts that specify the number of times a small graph appears as an induced subgraph of a large graph. Our algorithm uses the novel concept of 33-profile sparsifiers: sparse graphs that can be used to approximate the full 33-profile counts for a given large graph. Further, we study the problem of estimating local and ego 33-profiles, two graph quantities that characterize the local neighborhood of each vertex of a graph. Our algorithm is distributed and operates as a vertex program over the GraphLab PowerGraph framework. We introduce the concept of edge pivoting which allows us to collect 22-hop information without maintaining an explicit 22-hop neighborhood list at each vertex. This enables the computation of all the local 33-profiles in parallel with minimal communication. We test out implementation in several experiments scaling up to 640640 cores on Amazon EC2. We find that our algorithm can estimate the 33-profile of a graph in approximately the same time as triangle counting. For the harder problem of ego 33-profiles, we introduce an algorithm that can estimate profiles of hundreds of thousands of vertices in parallel, in the timescale of minutes.Comment: To appear in part at KDD'1

    Randomized word-parallel algorithms for detection of small induced subgraphs

    Get PDF
    Induced subgraph detection is a widely studied set of problems in theoretical computer science, with applications in e.g. social networks, molecular biology and other domains that use graph representations. Our focus lies on practical comparison of some well-known deterministic algorithms to recent Monte Carlo algorithms for detecting subgraphs on three and four vertices. For algorithms that involve operations with adjacency matrices, we study the gain of applying word parallelism, i.e. exploiting the parallel nature of common processor operations such as bitwise conjunction and disjunction. We present results of empirical running times for our implementations of the algorithms. Our results reveal insights as to when the Monte Carlo algorithms trump their deterministic counterparts and also include statistically significant improvements of several algorithms when applying word parallelism

    Distributed Estimation of Graph 4-Profiles

    Full text link
    We present a novel distributed algorithm for counting all four-node induced subgraphs in a big graph. These counts, called the 44-profile, describe a graph's connectivity properties and have found several uses ranging from bioinformatics to spam detection. We also study the more complicated problem of estimating the local 44-profiles centered at each vertex of the graph. The local 44-profile embeds every vertex in an 1111-dimensional space that characterizes the local geometry of its neighborhood: vertices that connect different clusters will have different local 44-profiles compared to those that are only part of one dense cluster. Our algorithm is a local, distributed message-passing scheme on the graph and computes all the local 44-profiles in parallel. We rely on two novel theoretical contributions: we show that local 44-profiles can be calculated using compressed two-hop information and also establish novel concentration results that show that graphs can be substantially sparsified and still retain good approximation quality for the global 44-profile. We empirically evaluate our algorithm using a distributed GraphLab implementation that we scaled up to 640640 cores. We show that our algorithm can compute global and local 44-profiles of graphs with millions of edges in a few minutes, significantly improving upon the previous state of the art.Comment: To appear in part at WWW'1
    corecore