9,699 research outputs found

    Distributed Triangle Counting in the Graphulo Matrix Math Library

    Full text link
    Triangle counting is a key algorithm for large graph analysis. The Graphulo library provides a framework for implementing graph algorithms on the Apache Accumulo distributed database. In this work we adapt two algorithms for counting triangles, one that uses the adjacency matrix and another that also uses the incidence matrix, to the Graphulo library for server-side processing inside Accumulo. Cloud-based experiments show a similar performance profile for these different approaches on the family of power law Graph500 graphs, for which data skew increasingly bottlenecks. These results motivate the design of skew-aware hybrid algorithms that we propose for future work.Comment: Honorable mention in the 2017 IEEE HPEC's Graph Challeng

    Algorithmic Complexity of Power Law Networks

    Full text link
    It was experimentally observed that the majority of real-world networks follow power law degree distribution. The aim of this paper is to study the algorithmic complexity of such "typical" networks. The contribution of this work is twofold. First, we define a deterministic condition for checking whether a graph has a power law degree distribution and experimentally validate it on real-world networks. This definition allows us to derive interesting properties of power law networks. We observe that for exponents of the degree distribution in the range [1,2][1,2] such networks exhibit double power law phenomenon that was observed for several real-world networks. Our observation indicates that this phenomenon could be explained by just pure graph theoretical properties. The second aim of our work is to give a novel theoretical explanation why many algorithms run faster on real-world data than what is predicted by algorithmic worst-case analysis. We show how to exploit the power law degree distribution to design faster algorithms for a number of classical P-time problems including transitive closure, maximum matching, determinant, PageRank and matrix inverse. Moreover, we deal with the problems of counting triangles and finding maximum clique. Previously, it has been only shown that these problems can be solved very efficiently on power law graphs when these graphs are random, e.g., drawn at random from some distribution. However, it is unclear how to relate such a theoretical analysis to real-world graphs, which are fixed. Instead of that, we show that the randomness assumption can be replaced with a simple condition on the degrees of adjacent vertices, which can be used to obtain similar results. As a result, in some range of power law exponents, we are able to solve the maximum clique problem in polynomial time, although in general power law networks the problem is NP-complete

    Performance bounds for expander-based compressed sensing in Poisson noise

    Full text link
    This paper provides performance bounds for compressed sensing in the presence of Poisson noise using expander graphs. The Poisson noise model is appropriate for a variety of applications, including low-light imaging and digital streaming, where the signal-independent and/or bounded noise models used in the compressed sensing literature are no longer applicable. In this paper, we develop a novel sensing paradigm based on expander graphs and propose a MAP algorithm for recovering sparse or compressible signals from Poisson observations. The geometry of the expander graphs and the positivity of the corresponding sensing matrices play a crucial role in establishing the bounds on the signal reconstruction error of the proposed algorithm. We support our results with experimental demonstrations of reconstructing average packet arrival rates and instantaneous packet counts at a router in a communication network, where the arrivals of packets in each flow follow a Poisson process.Comment: revised version; accepted to IEEE Transactions on Signal Processin

    Efficient Triangle Counting in Large Graphs via Degree-based Vertex Partitioning

    Full text link
    The number of triangles is a computationally expensive graph statistic which is frequently used in complex network analysis (e.g., transitivity ratio), in various random graph models (e.g., exponential random graph model) and in important real world applications such as spam detection, uncovering of the hidden thematic structure of the Web and link recommendation. Counting triangles in graphs with millions and billions of edges requires algorithms which run fast, use small amount of space, provide accurate estimates of the number of triangles and preferably are parallelizable. In this paper we present an efficient triangle counting algorithm which can be adapted to the semistreaming model. The key idea of our algorithm is to combine the sampling algorithm of Tsourakakis et al. and the partitioning of the set of vertices into a high degree and a low degree subset respectively as in the Alon, Yuster and Zwick work treating each set appropriately. We obtain a running time O(m+m3/2Δlogntϵ2)O \left(m + \frac{m^{3/2} \Delta \log{n}}{t \epsilon^2} \right) and an ϵ\epsilon approximation (multiplicative error), where nn is the number of vertices, mm the number of edges and Δ\Delta the maximum number of triangles an edge is contained. Furthermore, we show how this algorithm can be adapted to the semistreaming model with space usage O(m1/2logn+m3/2Δlogntϵ2)O\left(m^{1/2}\log{n} + \frac{m^{3/2} \Delta \log{n}}{t \epsilon^2} \right) and a constant number of passes (three) over the graph stream. We apply our methods in various networks with several millions of edges and we obtain excellent results. Finally, we propose a random projection based method for triangle counting and provide a sufficient condition to obtain an estimate with low variance.Comment: 1) 12 pages 2) To appear in the 7th Workshop on Algorithms and Models for the Web Graph (WAW 2010
    corecore