19,406 research outputs found

    A Two-Sided Error Distributed Property Tester For Conductance

    Get PDF
    We study property testing in the distributed model and extend its setting from testing with one-sided error to testing with two-sided error. In particular, we develop a two-sided error property tester for general graphs with round complexity O(log(n) / (epsilon Phi^2)) in the CONGEST model, which accepts graphs with conductance Phi and rejects graphs that are epsilon-far from having conductance at least Phi^2 / 1000 with constant probability. Our main insight is that one can start poly(n) random walks from a few random vertices without violating the congestion and unite the results to obtain a consistent answer from all vertices. For connected graphs, this is even possible when the number of vertices is unknown. We also obtain a matching Omega(log n) lower bound for the LOCAL and CONGEST models by an indistinguishability argument. Although the power of vertex labels that arises from two-sided error might seem to be much stronger than in the sequential query model, we can show that this is not the case

    Property testing of graphs and the role of neighborhood distributions

    Get PDF
    Property testing considers decision problems in the regime of sublinear complexity. Most classical decision problems require at least linear time complexity in order to read the whole input. Hence, decision problems are relaxed by introducing a gap between “yes” and “no” instances: A property tester for a property Π (e. g., planarity) is a randomized algorithm with constant error probability that accepts objects that have Π (planar graphs) and that rejects objects that have linear edit distance to any object from Π (graphs with a linear number of crossing edges in every planar embedding). For property testers, locality is a natural and crucial concept because they cannot obtain a global view of their input. In this thesis, we investigate property testing in graphs and how testers leverage the information contained in the neighborhoods of randomly sampled vertices: We provide some structural insights regarding properties with constant testing complexity in graphs with bounded (maximum vertex) degree and a connection between testers with constant complexity for general graphs and testers with logarithmic space complexity for random-order streams. We also present testers for some minor-freeness properties and a tester for conductance in the distributed CONGEST model

    Testing Small Set Expansion in General Graphs

    Get PDF
    We consider the problem of testing small set expansion for general graphs. A graph GG is a (k,ϕ)(k,\phi)-expander if every subset of volume at most kk has conductance at least ϕ\phi. Small set expansion has recently received significant attention due to its close connection to the unique games conjecture, the local graph partitioning algorithms and locally testable codes. We give testers with two-sided error and one-sided error in the adjacency list model that allows degree and neighbor queries to the oracle of the input graph. The testers take as input an nn-vertex graph GG, a volume bound kk, an expansion bound ϕ\phi and a distance parameter ε>0\varepsilon>0. For the two-sided error tester, with probability at least 2/32/3, it accepts the graph if it is a (k,ϕ)(k,\phi)-expander and rejects the graph if it is ε\varepsilon-far from any (k,ϕ)(k^*,\phi^*)-expander, where k=Θ(kε)k^*=\Theta(k\varepsilon) and ϕ=Θ(ϕ4min{log(4m/k),logn}(lnk))\phi^*=\Theta(\frac{\phi^4}{\min\{\log(4m/k),\log n\}\cdot(\ln k)}). The query complexity and running time of the tester are O~(mϕ4ε2)\widetilde{O}(\sqrt{m}\phi^{-4}\varepsilon^{-2}), where mm is the number of edges of the graph. For the one-sided error tester, it accepts every (k,ϕ)(k,\phi)-expander, and with probability at least 2/32/3, rejects every graph that is ε\varepsilon-far from (k,ϕ)(k^*,\phi^*)-expander, where k=O(k1ξ)k^*=O(k^{1-\xi}) and ϕ=O(ξϕ2)\phi^*=O(\xi\phi^2) for any 0<ξ<10<\xi<1. The query complexity and running time of this tester are O~(nε3+kεϕ4)\widetilde{O}(\sqrt{\frac{n}{\varepsilon^3}}+\frac{k}{\varepsilon \phi^4}). We also give a two-sided error tester with smaller gap between ϕ\phi^* and ϕ\phi in the rotation map model that allows (neighbor, index) queries and degree queries.Comment: 23 pages; STACS 201

    Testing Cluster Structure of Graphs

    Full text link
    We study the problem of recognizing the cluster structure of a graph in the framework of property testing in the bounded degree model. Given a parameter ε\varepsilon, a dd-bounded degree graph is defined to be (k,ϕ)(k, \phi)-clusterable, if it can be partitioned into no more than kk parts, such that the (inner) conductance of the induced subgraph on each part is at least ϕ\phi and the (outer) conductance of each part is at most cd,kε4ϕ2c_{d,k}\varepsilon^4\phi^2, where cd,kc_{d,k} depends only on d,kd,k. Our main result is a sublinear algorithm with the running time O~(npoly(ϕ,k,1/ε))\widetilde{O}(\sqrt{n}\cdot\mathrm{poly}(\phi,k,1/\varepsilon)) that takes as input a graph with maximum degree bounded by dd, parameters kk, ϕ\phi, ε\varepsilon, and with probability at least 23\frac23, accepts the graph if it is (k,ϕ)(k,\phi)-clusterable and rejects the graph if it is ε\varepsilon-far from (k,ϕ)(k, \phi^*)-clusterable for ϕ=cd,kϕ2ε4logn\phi^* = c'_{d,k}\frac{\phi^2 \varepsilon^4}{\log n}, where cd,kc'_{d,k} depends only on d,kd,k. By the lower bound of Ω(n)\Omega(\sqrt{n}) on the number of queries needed for testing graph expansion, which corresponds to k=1k=1 in our problem, our algorithm is asymptotically optimal up to polylogarithmic factors.Comment: Full version of STOC 201

    Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale

    Full text link
    Notions of community quality underlie network clustering. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms -- Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on information recovery metrics. Our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it absolutely superior. Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters

    Algorithmic and Statistical Perspectives on Large-Scale Data Analysis

    Full text link
    In recent years, ideas from statistics and scientific computing have begun to interact in increasingly sophisticated and fruitful ways with ideas from computer science and the theory of algorithms to aid in the development of improved worst-case algorithms that are useful for large-scale scientific and Internet data analysis problems. In this chapter, I will describe two recent examples---one having to do with selecting good columns or features from a (DNA Single Nucleotide Polymorphism) data matrix, and the other having to do with selecting good clusters or communities from a data graph (representing a social or information network)---that drew on ideas from both areas and that may serve as a model for exploiting complementary algorithmic and statistical perspectives in order to solve applied large-scale data analysis problems.Comment: 33 pages. To appear in Uwe Naumann and Olaf Schenk, editors, "Combinatorial Scientific Computing," Chapman and Hall/CRC Press, 201
    corecore