51 research outputs found

    Efficient Triangle Counting in Large Graphs via Degree-based Vertex Partitioning

    Full text link
    The number of triangles is a computationally expensive graph statistic which is frequently used in complex network analysis (e.g., transitivity ratio), in various random graph models (e.g., exponential random graph model) and in important real world applications such as spam detection, uncovering of the hidden thematic structure of the Web and link recommendation. Counting triangles in graphs with millions and billions of edges requires algorithms which run fast, use small amount of space, provide accurate estimates of the number of triangles and preferably are parallelizable. In this paper we present an efficient triangle counting algorithm which can be adapted to the semistreaming model. The key idea of our algorithm is to combine the sampling algorithm of Tsourakakis et al. and the partitioning of the set of vertices into a high degree and a low degree subset respectively as in the Alon, Yuster and Zwick work treating each set appropriately. We obtain a running time O(m+m3/2Δlogntϵ2)O \left(m + \frac{m^{3/2} \Delta \log{n}}{t \epsilon^2} \right) and an ϵ\epsilon approximation (multiplicative error), where nn is the number of vertices, mm the number of edges and Δ\Delta the maximum number of triangles an edge is contained. Furthermore, we show how this algorithm can be adapted to the semistreaming model with space usage O(m1/2logn+m3/2Δlogntϵ2)O\left(m^{1/2}\log{n} + \frac{m^{3/2} \Delta \log{n}}{t \epsilon^2} \right) and a constant number of passes (three) over the graph stream. We apply our methods in various networks with several millions of edges and we obtain excellent results. Finally, we propose a random projection based method for triangle counting and provide a sufficient condition to obtain an estimate with low variance.Comment: 1) 12 pages 2) To appear in the 7th Workshop on Algorithms and Models for the Web Graph (WAW 2010

    How Hard is Counting Triangles in the Streaming Model

    Full text link
    The problem of (approximately) counting the number of triangles in a graph is one of the basic problems in graph theory. In this paper we study the problem in the streaming model. We study the amount of memory required by a randomized algorithm to solve this problem. In case the algorithm is allowed one pass over the stream, we present a best possible lower bound of Ω(m)\Omega(m) for graphs GG with mm edges on nn vertices. If a constant number of passes is allowed, we show a lower bound of Ω(m/T)\Omega(m/T), TT the number of triangles. We match, in some sense, this lower bound with a 2-pass O(m/T1/3)O(m/T^{1/3})-memory algorithm that solves the problem of distinguishing graphs with no triangles from graphs with at least TT triangles. We present a new graph parameter ρ(G)\rho(G) -- the triangle density, and conjecture that the space complexity of the triangles problem is Ω(m/ρ(G))\Omega(m/\rho(G)). We match this by a second algorithm that solves the distinguishing problem using O(m/ρ(G))O(m/\rho(G))-memory

    Parallel Algorithms for Small Subgraph Counting

    Get PDF
    Subgraph counting is a fundamental problem in analyzing massive graphs, often studied in the context of social and complex networks. There is a rich literature on designing efficient, accurate, and scalable algorithms for this problem. In this work, we tackle this challenge and design several new algorithms for subgraph counting in the Massively Parallel Computation (MPC) model: Given a graph GG over nn vertices, mm edges and TT triangles, our first main result is an algorithm that, with high probability, outputs a (1+ε)(1+\varepsilon)-approximation to TT, with optimal round and space complexity provided any Smax(m,n2/m)S \geq \max{(\sqrt m, n^2/m)} space per machine, assuming T=Ω(m/n)T=\Omega(\sqrt{m/n}). Our second main result is an O~δ(loglogn)\tilde{O}_{\delta}(\log \log n)-rounds algorithm for exactly counting the number of triangles, parametrized by the arboricity α\alpha of the input graph. The space per machine is O(nδ)O(n^{\delta}) for any constant δ\delta, and the total space is O(mα)O(m\alpha), which matches the time complexity of (combinatorial) triangle counting in the sequential model. We also prove that this result can be extended to exactly counting kk-cliques for any constant kk, with the same round complexity and total space O(mαk2)O(m\alpha^{k-2}). Alternatively, allowing O(α2)O(\alpha^2) space per machine, the total space requirement reduces to O(nα2)O(n\alpha^2). Finally, we prove that a recent result of Bera, Pashanasangi and Seshadhri (ITCS 2020) for exactly counting all subgraphs of size at most 55, can be implemented in the MPC model in O~δ(logn)\tilde{O}_{\delta}(\sqrt{\log n}) rounds, O(nδ)O(n^{\delta}) space per machine and O(mα3)O(m\alpha^3) total space. Therefore, this result also exhibits the phenomenon that a time bound in the sequential model translates to a space bound in the MPC model

    Towards Performance Portable Graph Algorithms

    Get PDF
    In today's data-driven world, our computational resources have become heterogeneous, making the processing of large-scale graphs in an architecture agnostic manner crucial. Traditionally, hand-optimized high-performance computing (HPC) solutions have been studied and used to implement highly efficient and scalable graph algorithms. In recent years, several graph processing and management systems have also been proposed. Hand optimized HPC approaches require high levels of expertise and graph processing frameworks suffer from expressibility and performance. Portability is a major concern for both approaches. The main thesis of this work is that block-based graph algorithms offer a compromise between efficient parallelism and architecture agnostic algorithm design for a wide class of graph problems. This dissertation seeks to prove this thesis by focusing the work on the three pillars; data/computation partitioning, block-based algorithm design, and performance portability. In this dissertation, we first show how we can partition the computation and the data to design efficient block-based algorithms for solving graph merging and triangle counting problems. Then, generalizing from our experiences, we propose an algorithmic framework, for shared-memory, heterogeneous machines for implementing block-based graph algorithms; PGAbB. PGAbB aims to maximally leverage different architectures by implementing a task-based execution on top of a block-based programming model. In this talk we will discuss PGAbB's programming model, algorithmic optimizations for scheduling, and load-balancing strategies for graph problems on real-world and synthetic inputs.Ph.D
    corecore