477 research outputs found

    Graph Sparsification in the Semi-streaming Model

    Get PDF
    Analyzing massive data sets has been one of the key motivations for studying streaming algorithms. In recent years, there has been significant progress in analysing distributions in a streaming setting, but the progress on graph problems has been limited. A main reason for this has been the existence of linear space lower bounds for even simple problems such as determining the connectedness of a graph. However, in many new scenarios that arise from social and other interaction networks, the number of vertices is significantly less than the number of edges. This has led to the formulation of the semi-streaming model where we assume that the space is (near) linear in the number of vertices (but not necessarily the edges), and the edges appear in an arbitrary (and possibly adversarial) order. In this paper we focus on graph sparsification, which is one of the major building blocks in a variety of graph algorithms. There has been a long history of (non-streaming) sampling algorithms that provide sparse graph approximations and it a natural question to ask if the sparsification can be achieved using a small space, and in addition using a single pass over the data? The question is interesting from the standpoint of both theory and practice and we answer the question in the affirmative, by providing a one pass O~(n/ϵ2)\tilde{O}(n/\epsilon^{2}) space algorithm that produces a sparsification that approximates each cut to a (1+ϵ)(1+\epsilon) factor. We also show that Ω(nlog1ϵ)\Omega(n \log \frac1\epsilon) space is necessary for a one pass streaming algorithm to approximate the min-cut, improving upon the Ω(n)\Omega(n) lower bound that arises from lower bounds for testing connectivity

    Analyzing Massive Graphs in the Semi-streaming Model

    Get PDF
    Massive graphs arise in a many scenarios, for example, traffic data analysis in large networks, large scale scientific experiments, and clustering of large data sets. The semi-streaming model was proposed for processing massive graphs. In the semi-streaming model, we have a random accessible memory which is near-linear in the number of vertices. The input graph (or equivalently, edges in the graph) is presented as a sequential list of edges (insertion-only model) or edge insertions and deletions (dynamic model). The list is read-only but we may make multiple passes over the list. There has been a few results in the insertion-only model such as computing distance spanners and approximating the maximum matching. In this thesis, we present some algorithms and techniques for (i) solving more complex problems in the semi-streaming model, (for example, problems in the dynamic model) and (ii) having better solutions for the problems which have been studied (for example, the maximum matching problem). In course of both of these, we develop new techniques with broad applications and explore the rich trade-offs between the complexity of models (insertion-only streams vs. dynamic streams), the number of passes, space, accuracy, and running time. 1. We initiate the study of dynamic graph streams. We start with basic problems such as the connectivity problem and computing the minimum spanning tree. These problems are trivial in the insertion-only model. However, they require non-trivial (and multiple passes for computing the exact minimum spanning tree) algorithms in the dynamic model. 2. Second, we present a graph sparsification algorithm in the semi-streaming model. A graph sparsification is a sparse graph that approximately preserves all the cut values of a graph. Such a graph acts as an oracle for solving cut-related problems, for example, the minimum cut problem and the multicut problem. Our algorithm produce a graph sparsification with high probability in one pass. 3. Third, we use the primal-dual algorithms to develop the semi-streaming algorithms. The primal-dual algorithms have been widely accepted as a framework for solving linear programs and semidefinite programs faster. In contrast, we apply the method for reducing space and number of passes in addition to reducing the running time. We also present some examples that arise in applications and show how to apply the techniques: the multicut problem, the correlation clustering problem, and the maximum matching problem. As a consequence, we also develop near-linear time algorithms for the bb-matching problems which were not known before

    Efficient Triangle Counting in Large Graphs via Degree-based Vertex Partitioning

    Full text link
    The number of triangles is a computationally expensive graph statistic which is frequently used in complex network analysis (e.g., transitivity ratio), in various random graph models (e.g., exponential random graph model) and in important real world applications such as spam detection, uncovering of the hidden thematic structure of the Web and link recommendation. Counting triangles in graphs with millions and billions of edges requires algorithms which run fast, use small amount of space, provide accurate estimates of the number of triangles and preferably are parallelizable. In this paper we present an efficient triangle counting algorithm which can be adapted to the semistreaming model. The key idea of our algorithm is to combine the sampling algorithm of Tsourakakis et al. and the partitioning of the set of vertices into a high degree and a low degree subset respectively as in the Alon, Yuster and Zwick work treating each set appropriately. We obtain a running time O(m+m3/2Δlogntϵ2)O \left(m + \frac{m^{3/2} \Delta \log{n}}{t \epsilon^2} \right) and an ϵ\epsilon approximation (multiplicative error), where nn is the number of vertices, mm the number of edges and Δ\Delta the maximum number of triangles an edge is contained. Furthermore, we show how this algorithm can be adapted to the semistreaming model with space usage O(m1/2logn+m3/2Δlogntϵ2)O\left(m^{1/2}\log{n} + \frac{m^{3/2} \Delta \log{n}}{t \epsilon^2} \right) and a constant number of passes (three) over the graph stream. We apply our methods in various networks with several millions of edges and we obtain excellent results. Finally, we propose a random projection based method for triangle counting and provide a sufficient condition to obtain an estimate with low variance.Comment: 1) 12 pages 2) To appear in the 7th Workshop on Algorithms and Models for the Web Graph (WAW 2010

    A Linear-time Algorithm for Sparsification of Unweighted Graphs

    Full text link
    Given an undirected graph GG and an error parameter ϵ>0\epsilon > 0, the {\em graph sparsification} problem requires sampling edges in GG and giving the sampled edges appropriate weights to obtain a sparse graph GϵG_{\epsilon} with the following property: the weight of every cut in GϵG_{\epsilon} is within a factor of (1±ϵ)(1\pm \epsilon) of the weight of the corresponding cut in GG. If GG is unweighted, an O(mlogn)O(m\log n)-time algorithm for constructing GϵG_{\epsilon} with O(nlogn/ϵ2)O(n\log n/\epsilon^2) edges in expectation, and an O(m)O(m)-time algorithm for constructing GϵG_{\epsilon} with O(nlog2n/ϵ2)O(n\log^2 n/\epsilon^2) edges in expectation have recently been developed (Hariharan-Panigrahi, 2010). In this paper, we improve these results by giving an O(m)O(m)-time algorithm for constructing GϵG_{\epsilon} with O(nlogn/ϵ2)O(n\log n/\epsilon^2) edges in expectation, for unweighted graphs. Our algorithm is optimal in terms of its time complexity; further, no efficient algorithm is known for constructing a sparser GϵG_{\epsilon}. Our algorithm is Monte-Carlo, i.e. it produces the correct output with high probability, as are all efficient graph sparsification algorithms

    Sketching Cuts in Graphs and Hypergraphs

    Full text link
    Sketching and streaming algorithms are in the forefront of current research directions for cut problems in graphs. In the streaming model, we show that (1ϵ)(1-\epsilon)-approximation for Max-Cut must use n1O(ϵ)n^{1-O(\epsilon)} space; moreover, beating 4/54/5-approximation requires polynomial space. For the sketching model, we show that rr-uniform hypergraphs admit a (1+ϵ)(1+\epsilon)-cut-sparsifier (i.e., a weighted subhypergraph that approximately preserves all the cuts) with O(ϵ2n(r+logn))O(\epsilon^{-2} n (r+\log n)) edges. We also make first steps towards sketching general CSPs (Constraint Satisfaction Problems)

    Simple parallel and distributed algorithms for spectral graph sparsification

    Full text link
    We describe a simple algorithm for spectral graph sparsification, based on iterative computations of weighted spanners and uniform sampling. Leveraging the algorithms of Baswana and Sen for computing spanners, we obtain the first distributed spectral sparsification algorithm. We also obtain a parallel algorithm with improved work and time guarantees. Combining this algorithm with the parallel framework of Peng and Spielman for solving symmetric diagonally dominant linear systems, we get a parallel solver which is much closer to being practical and significantly more efficient in terms of the total work.Comment: replaces "A simple parallel and distributed algorithm for spectral sparsification". Minor change

    Probabilistic Spectral Sparsification In Sublinear Time

    Full text link
    In this paper, we introduce a variant of spectral sparsification, called probabilistic (ε,δ)(\varepsilon,\delta)-spectral sparsification. Roughly speaking, it preserves the cut value of any cut (S,Sc)(S,S^{c}) with an 1±ε1\pm\varepsilon multiplicative error and a δS\delta\left|S\right| additive error. We show how to produce a probabilistic (ε,δ)(\varepsilon,\delta)-spectral sparsifier with O(nlogn/ε2)O(n\log n/\varepsilon^{2}) edges in time O~(n/ε2δ)\tilde{O}(n/\varepsilon^{2}\delta) time for unweighted undirected graph. This gives fastest known sub-linear time algorithms for different cut problems on unweighted undirected graph such as - An O~(n/OPT+n3/2+t)\tilde{O}(n/OPT+n^{3/2+t}) time O(logn/t)O(\sqrt{\log n/t})-approximation algorithm for the sparsest cut problem and the balanced separator problem. - A n1+o(1)/ε4n^{1+o(1)}/\varepsilon^{4} time approximation minimum s-t cut algorithm with an εn\varepsilon n additive error
    corecore