26,237 research outputs found

    Average Sensitivity of Graph Algorithms

    Full text link
    In modern applications of graphs algorithms, where the graphs of interest are large and dynamic, it is unrealistic to assume that an input representation contains the full information of a graph being studied. Hence, it is desirable to use algorithms that, even when only a (large) subgraph is available, output solutions that are close to the solutions output when the whole graph is available. We formalize this idea by introducing the notion of average sensitivity of graph algorithms, which is the average earth mover's distance between the output distributions of an algorithm on a graph and its subgraph obtained by removing an edge, where the average is over the edges removed and the distance between two outputs is the Hamming distance. In this work, we initiate a systematic study of average sensitivity. After deriving basic properties of average sensitivity such as composition, we provide efficient approximation algorithms with low average sensitivities for concrete graph problems, including the minimum spanning forest problem, the global minimum cut problem, the minimum ss-tt cut problem, and the maximum matching problem. In addition, we prove that the average sensitivity of our global minimum cut algorithm is almost optimal, by showing a nearly matching lower bound. We also show that every algorithm for the 2-coloring problem has average sensitivity linear in the number of vertices. One of the main ideas involved in designing our algorithms with low average sensitivity is the following fact; if the presence of a vertex or an edge in the solution output by an algorithm can be decided locally, then the algorithm has a low average sensitivity, allowing us to reuse the analyses of known sublinear-time algorithms and local computation algorithms (LCAs). Using this connection, we show that every LCA for 2-coloring has linear query complexity, thereby answering an open question.Comment: 39 pages, 1 figur

    A Simple Deterministic Distributed MST Algorithm, with Near-Optimal Time and Message Complexities

    Full text link
    Distributed minimum spanning tree (MST) problem is one of the most central and fundamental problems in distributed graph algorithms. Garay et al. \cite{GKP98,KP98} devised an algorithm with running time O(D+nlogn)O(D + \sqrt{n} \cdot \log^* n), where DD is the hop-diameter of the input nn-vertex mm-edge graph, and with message complexity O(m+n3/2)O(m + n^{3/2}). Peleg and Rubinovich \cite{PR99} showed that the running time of the algorithm of \cite{KP98} is essentially tight, and asked if one can achieve near-optimal running time **together with near-optimal message complexity**. In a recent breakthrough, Pandurangan et al. \cite{PRS16} answered this question in the affirmative, and devised a **randomized** algorithm with time O~(D+n)\tilde{O}(D+ \sqrt{n}) and message complexity O~(m)\tilde{O}(m). They asked if such a simultaneous time- and message-optimality can be achieved by a **deterministic** algorithm. In this paper, building upon the work of \cite{PRS16}, we answer this question in the affirmative, and devise a **deterministic** algorithm that computes MST in time O((D+n)logn)O((D + \sqrt{n}) \cdot \log n), using O(mlogn+nlognlogn)O(m \cdot \log n + n \log n \cdot \log^* n) messages. The polylogarithmic factors in the time and message complexities of our algorithm are significantly smaller than the respective factors in the result of \cite{PRS16}. Also, our algorithm and its analysis are very **simple** and self-contained, as opposed to rather complicated previous sublinear-time algorithms \cite{GKP98,KP98,E04b,PRS16}

    Almost-Smooth Histograms and Sliding-Window Graph Algorithms

    Full text link
    We study algorithms for the sliding-window model, an important variant of the data-stream model, in which the goal is to compute some function of a fixed-length suffix of the stream. We extend the smooth-histogram framework of Braverman and Ostrovsky (FOCS 2007) to almost-smooth functions, which includes all subadditive functions. Specifically, we show that if a subadditive function can be (1+ϵ)(1+\epsilon)-approximated in the insertion-only streaming model, then it can be (2+ϵ)(2+\epsilon)-approximated also in the sliding-window model with space complexity larger by factor O(ϵ1logw)O(\epsilon^{-1}\log w), where ww is the window size. We demonstrate how our framework yields new approximation algorithms with relatively little effort for a variety of problems that do not admit the smooth-histogram technique. For example, in the frequency-vector model, a symmetric norm is subadditive and thus we obtain a sliding-window (2+ϵ)(2+\epsilon)-approximation algorithm for it. Another example is for streaming matrices, where we derive a new sliding-window (2+ϵ)(\sqrt{2}+\epsilon)-approximation algorithm for Schatten 44-norm. We then consider graph streams and show that many graph problems are subadditive, including maximum submodular matching, minimum vertex-cover, and maximum kk-cover, thereby deriving sliding-window O(1)O(1)-approximation algorithms for them almost for free (using known insertion-only algorithms). Finally, we design for every d(1,2]d\in (1,2] an artificial function, based on the maximum-matching size, whose almost-smoothness parameter is exactly dd

    Parallel Algorithms for Geometric Graph Problems

    Full text link
    We give algorithms for geometric graph problems in the modern parallel models inspired by MapReduce. For example, for the Minimum Spanning Tree (MST) problem over a set of points in the two-dimensional space, our algorithm computes a (1+ϵ)(1+\epsilon)-approximate MST. Our algorithms work in a constant number of rounds of communication, while using total space and communication proportional to the size of the data (linear space and near linear time algorithms). In contrast, for general graphs, achieving the same result for MST (or even connectivity) remains a challenging open problem, despite drawing significant attention in recent years. We develop a general algorithmic framework that, besides MST, also applies to Earth-Mover Distance (EMD) and the transportation cost problem. Our algorithmic framework has implications beyond the MapReduce model. For example it yields a new algorithm for computing EMD cost in the plane in near-linear time, n1+oϵ(1)n^{1+o_\epsilon(1)}. We note that while recently Sharathkumar and Agarwal developed a near-linear time algorithm for (1+ϵ)(1+\epsilon)-approximating EMD, our algorithm is fundamentally different, and, for example, also solves the transportation (cost) problem, raised as an open question in their work. Furthermore, our algorithm immediately gives a (1+ϵ)(1+\epsilon)-approximation algorithm with nδn^{\delta} space in the streaming-with-sorting model with 1/δO(1)1/\delta^{O(1)} passes. As such, it is tempting to conjecture that the parallel models may also constitute a concrete playground in the quest for efficient algorithms for EMD (and other similar problems) in the vanilla streaming model, a well-known open problem

    Large induced subgraphs via triangulations and CMSO

    Full text link
    We obtain an algorithmic meta-theorem for the following optimization problem. Let \phi\ be a Counting Monadic Second Order Logic (CMSO) formula and t be an integer. For a given graph G, the task is to maximize |X| subject to the following: there is a set of vertices F of G, containing X, such that the subgraph G[F] induced by F is of treewidth at most t, and structure (G[F],X) models \phi. Some special cases of this optimization problem are the following generic examples. Each of these cases contains various problems as a special subcase: 1) "Maximum induced subgraph with at most l copies of cycles of length 0 modulo m", where for fixed nonnegative integers m and l, the task is to find a maximum induced subgraph of a given graph with at most l vertex-disjoint cycles of length 0 modulo m. 2) "Minimum \Gamma-deletion", where for a fixed finite set of graphs \Gamma\ containing a planar graph, the task is to find a maximum induced subgraph of a given graph containing no graph from \Gamma\ as a minor. 3) "Independent \Pi-packing", where for a fixed finite set of connected graphs \Pi, the task is to find an induced subgraph G[F] of a given graph G with the maximum number of connected components, such that each connected component of G[F] is isomorphic to some graph from \Pi. We give an algorithm solving the optimization problem on an n-vertex graph G in time O(#pmc n^{t+4} f(t,\phi)), where #pmc is the number of all potential maximal cliques in G and f is a function depending of t and \phi\ only. We also show how a similar running time can be obtained for the weighted version of the problem. Pipelined with known bounds on the number of potential maximal cliques, we deduce that our optimization problem can be solved in time O(1.7347^n) for arbitrary graphs, and in polynomial time for graph classes with polynomial number of minimal separators

    Instance and Output Optimal Parallel Algorithms for Acyclic Joins

    Full text link
    Massively parallel join algorithms have received much attention in recent years, while most prior work has focused on worst-optimal algorithms. However, the worst-case optimality of these join algorithms relies on hard instances having very large output sizes, which rarely appear in practice. A stronger notion of optimality is {\em output-optimal}, which requires an algorithm to be optimal within the class of all instances sharing the same input and output size. An even stronger optimality is {\em instance-optimal}, i.e., the algorithm is optimal on every single instance, but this may not always be achievable. In the traditional RAM model of computation, the classical Yannakakis algorithm is instance-optimal on any acyclic join. But in the massively parallel computation (MPC) model, the situation becomes much more complicated. We first show that for the class of r-hierarchical joins, instance-optimality can still be achieved in the MPC model. Then, we give a new MPC algorithm for an arbitrary acyclic join with load O ({\IN \over p} + {\sqrt{\IN \cdot \OUT} \over p}), where \IN,\OUT are the input and output sizes of the join, and pp is the number of servers in the MPC model. This improves the MPC version of the Yannakakis algorithm by an O (\sqrt{\OUT \over \IN} ) factor. Furthermore, we show that this is output-optimal when \OUT = O(p \cdot \IN), for every acyclic but non-r-hierarchical join. Finally, we give the first output-sensitive lower bound for the triangle join in the MPC model, showing that it is inherently more difficult than acyclic joins

    Bicriteria Network Design Problems

    Full text link
    We study a general class of bicriteria network design problems. A generic problem in this class is as follows: Given an undirected graph and two minimization objectives (under different cost functions), with a budget specified on the first, find a <subgraph \from a given subgraph-class that minimizes the second objective subject to the budget on the first. We consider three different criteria - the total edge cost, the diameter and the maximum degree of the network. Here, we present the first polynomial-time approximation algorithms for a large class of bicriteria network design problems for the above mentioned criteria. The following general types of results are presented. First, we develop a framework for bicriteria problems and their approximations. Second, when the two criteria are the same %(note that the cost functions continue to be different) we present a ``black box'' parametric search technique. This black box takes in as input an (approximation) algorithm for the unicriterion situation and generates an approximation algorithm for the bicriteria case with only a constant factor loss in the performance guarantee. Third, when the two criteria are the diameter and the total edge costs we use a cluster-based approach to devise a approximation algorithms --- the solutions output violate both the criteria by a logarithmic factor. Finally, for the class of treewidth-bounded graphs, we provide pseudopolynomial-time algorithms for a number of bicriteria problems using dynamic programming. We show how these pseudopolynomial-time algorithms can be converted to fully polynomial-time approximation schemes using a scaling technique.Comment: 24 pages 1 figur

    Forest resampling for distributed sequential Monte Carlo

    Get PDF
    This paper brings explicit considerations of distributed computing architectures and data structures into the rigorous design of Sequential Monte Carlo (SMC) methods. A theoretical result established recently by the authors shows that adapting interaction between particles to suitably control the Effective Sample Size (ESS) is sufficient to guarantee stability of SMC algorithms. Our objective is to leverage this result and devise algorithms which are thus guaranteed to work well in a distributed setting. We make three main contributions to achieve this. Firstly, we study mathematical properties of the ESS as a function of matrices and graphs that parameterize the interaction amongst particles. Secondly, we show how these graphs can be induced by tree data structures which model the logical network topology of an abstract distributed computing environment. Thirdly, we present efficient distributed algorithms that achieve the desired ESS control, perform resampling and operate on forests associated with these trees
    corecore