309 research outputs found

    Efficient Triangle Counting in Large Graphs via Degree-based Vertex Partitioning

    Full text link
    The number of triangles is a computationally expensive graph statistic which is frequently used in complex network analysis (e.g., transitivity ratio), in various random graph models (e.g., exponential random graph model) and in important real world applications such as spam detection, uncovering of the hidden thematic structure of the Web and link recommendation. Counting triangles in graphs with millions and billions of edges requires algorithms which run fast, use small amount of space, provide accurate estimates of the number of triangles and preferably are parallelizable. In this paper we present an efficient triangle counting algorithm which can be adapted to the semistreaming model. The key idea of our algorithm is to combine the sampling algorithm of Tsourakakis et al. and the partitioning of the set of vertices into a high degree and a low degree subset respectively as in the Alon, Yuster and Zwick work treating each set appropriately. We obtain a running time O(m+m3/2Δlogntϵ2)O \left(m + \frac{m^{3/2} \Delta \log{n}}{t \epsilon^2} \right) and an ϵ\epsilon approximation (multiplicative error), where nn is the number of vertices, mm the number of edges and Δ\Delta the maximum number of triangles an edge is contained. Furthermore, we show how this algorithm can be adapted to the semistreaming model with space usage O(m1/2logn+m3/2Δlogntϵ2)O\left(m^{1/2}\log{n} + \frac{m^{3/2} \Delta \log{n}}{t \epsilon^2} \right) and a constant number of passes (three) over the graph stream. We apply our methods in various networks with several millions of edges and we obtain excellent results. Finally, we propose a random projection based method for triangle counting and provide a sufficient condition to obtain an estimate with low variance.Comment: 1) 12 pages 2) To appear in the 7th Workshop on Algorithms and Models for the Web Graph (WAW 2010

    Simple parallel and distributed algorithms for spectral graph sparsification

    Full text link
    We describe a simple algorithm for spectral graph sparsification, based on iterative computations of weighted spanners and uniform sampling. Leveraging the algorithms of Baswana and Sen for computing spanners, we obtain the first distributed spectral sparsification algorithm. We also obtain a parallel algorithm with improved work and time guarantees. Combining this algorithm with the parallel framework of Peng and Spielman for solving symmetric diagonally dominant linear systems, we get a parallel solver which is much closer to being practical and significantly more efficient in terms of the total work.Comment: replaces "A simple parallel and distributed algorithm for spectral sparsification". Minor change

    Probabilistic Spectral Sparsification In Sublinear Time

    Full text link
    In this paper, we introduce a variant of spectral sparsification, called probabilistic (ε,δ)(\varepsilon,\delta)-spectral sparsification. Roughly speaking, it preserves the cut value of any cut (S,Sc)(S,S^{c}) with an 1±ε1\pm\varepsilon multiplicative error and a δS\delta\left|S\right| additive error. We show how to produce a probabilistic (ε,δ)(\varepsilon,\delta)-spectral sparsifier with O(nlogn/ε2)O(n\log n/\varepsilon^{2}) edges in time O~(n/ε2δ)\tilde{O}(n/\varepsilon^{2}\delta) time for unweighted undirected graph. This gives fastest known sub-linear time algorithms for different cut problems on unweighted undirected graph such as - An O~(n/OPT+n3/2+t)\tilde{O}(n/OPT+n^{3/2+t}) time O(logn/t)O(\sqrt{\log n/t})-approximation algorithm for the sparsest cut problem and the balanced separator problem. - A n1+o(1)/ε4n^{1+o(1)}/\varepsilon^{4} time approximation minimum s-t cut algorithm with an εn\varepsilon n additive error

    Sketching Cuts in Graphs and Hypergraphs

    Full text link
    Sketching and streaming algorithms are in the forefront of current research directions for cut problems in graphs. In the streaming model, we show that (1ϵ)(1-\epsilon)-approximation for Max-Cut must use n1O(ϵ)n^{1-O(\epsilon)} space; moreover, beating 4/54/5-approximation requires polynomial space. For the sketching model, we show that rr-uniform hypergraphs admit a (1+ϵ)(1+\epsilon)-cut-sparsifier (i.e., a weighted subhypergraph that approximately preserves all the cuts) with O(ϵ2n(r+logn))O(\epsilon^{-2} n (r+\log n)) edges. We also make first steps towards sketching general CSPs (Constraint Satisfaction Problems)

    Constructing Linear-Sized Spectral Sparsification in Almost-Linear Time

    Full text link
    We present the first almost-linear time algorithm for constructing linear-sized spectral sparsification for graphs. This improves all previous constructions of linear-sized spectral sparsification, which requires Ω(n2)\Omega(n^2) time. A key ingredient in our algorithm is a novel combination of two techniques used in literature for constructing spectral sparsification: Random sampling by effective resistance, and adaptive constructions based on barrier functions.Comment: 22 pages. A preliminary version of this paper is to appear in proceedings of the 56th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2015

    Online Row Sampling

    Get PDF
    Finding a small spectral approximation for a tall n×dn \times d matrix AA is a fundamental numerical primitive. For a number of reasons, one often seeks an approximation whose rows are sampled from those of AA. Row sampling improves interpretability, saves space when AA is sparse, and preserves row structure, which is especially important, for example, when AA represents a graph. However, correctly sampling rows from AA can be costly when the matrix is large and cannot be stored and processed in memory. Hence, a number of recent publications focus on row sampling in the streaming setting, using little more space than what is required to store the outputted approximation [KL13, KLM+14]. Inspired by a growing body of work on online algorithms for machine learning and data analysis, we extend this work to a more restrictive online setting: we read rows of AA one by one and immediately decide whether each row should be kept in the spectral approximation or discarded, without ever retracting these decisions. We present an extremely simple algorithm that approximates AA up to multiplicative error ϵ\epsilon and additive error δ\delta using O(dlogdlog(ϵA2/δ)/ϵ2)O(d \log d \log(\epsilon||A||_2/\delta)/\epsilon^2) online samples, with memory overhead proportional to the cost of storing the spectral approximation. We also present an algorithm that uses O(d2O(d^2) memory but only requires O(dlog(ϵA2/δ)/ϵ2)O(d\log(\epsilon||A||_2/\delta)/\epsilon^2) samples, which we show is optimal. Our methods are clean and intuitive, allow for lower memory usage than prior work, and expose new theoretical properties of leverage score based matrix approximation
    corecore