Search CORE

19,406 research outputs found

A Two-Sided Error Distributed Property Tester For Conductance

Author: Fichtenberger Hendrik
Vasudev Yadu
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018)
Publication date: 01/01/2018
Field of study

We study property testing in the distributed model and extend its setting from testing with one-sided error to testing with two-sided error. In particular, we develop a two-sided error property tester for general graphs with round complexity O(log(n) / (epsilon Phi^2)) in the CONGEST model, which accepts graphs with conductance Phi and rejects graphs that are epsilon-far from having conductance at least Phi^2 / 1000 with constant probability. Our main insight is that one can start poly(n) random walks from a few random vertices without violating the congestion and unite the results to obtain a consistent answer from all vertices. For connected graphs, this is even possible when the number of vertices is unknown. We also obtain a matching Omega(log n) lower bound for the LOCAL and CONGEST models by an indistinguishability argument. Although the power of vertex labels that arises from two-sided error might seem to be much stronger than in the sequential query model, we can show that this is not the case

Dagstuhl Research Online Publication Server

Property testing of graphs and the role of neighborhood distributions

Author: Fichtenberger Hendrik
Publication venue
Publication date: 01/01/2020
Field of study

Property testing considers decision problems in the regime of sublinear complexity. Most classical decision problems require at least linear time complexity in order to read the whole input. Hence, decision problems are relaxed by introducing a gap between “yes” and “no” instances: A property tester for a property Π (e. g., planarity) is a randomized algorithm with constant error probability that accepts objects that have Π (planar graphs) and that rejects objects that have linear edit distance to any object from Π (graphs with a linear number of crossing edges in every planar embedding). For property testers, locality is a natural and crucial concept because they cannot obtain a global view of their input. In this thesis, we investigate property testing in graphs and how testers leverage the information contained in the neighborhoods of randomly sampled vertices: We provide some structural insights regarding properties with constant testing complexity in graphs with bounded (maximum vertex) degree and a connection between testers with constant complexity for general graphs and testers with logarithmic space complexity for random-order streams. We also present testers for some minor-freeness properties and a tester for conductance in the distributed CONGEST model

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

Testing Small Set Expansion in General Graphs

Author: Li Angsheng
Peng Pan
Publication venue
Publication date: 01/01/2015
Field of study

We consider the problem of testing small set expansion for general graphs. A graph

G

is a

(k,\phi)

-expander if every subset of volume at most

k

has conductance at least

\phi

. Small set expansion has recently received significant attention due to its close connection to the unique games conjecture, the local graph partitioning algorithms and locally testable codes. We give testers with two-sided error and one-sided error in the adjacency list model that allows degree and neighbor queries to the oracle of the input graph. The testers take as input an

n

-vertex graph

G

, a volume bound

k

, an expansion bound

\phi

and a distance parameter

\varepsilon>0

. For the two-sided error tester, with probability at least

2/3

, it accepts the graph if it is a

(k,\phi)

-expander and rejects the graph if it is

\varepsilon

-far from any

(k^*,\phi^*)

-expander, where

k^*=\Theta(k\varepsilon)

and

\phi^*=\Theta(\frac{\phi^4}{\min\{\log(4m/k),\log n\}\cdot(\ln k)})

. The query complexity and running time of the tester are

\widetilde{O}(\sqrt{m}\phi^{-4}\varepsilon^{-2})

, where

m

is the number of edges of the graph. For the one-sided error tester, it accepts every

(k,\phi)

-expander, and with probability at least

2/3

, rejects every graph that is

\varepsilon

-far from

(k^*,\phi^*)

-expander, where

k^*=O(k^{1-\xi})

and

\phi^*=O(\xi\phi^2)

for any

0<\xi<1

. The query complexity and running time of this tester are

\widetilde{O}(\sqrt{\frac{n}{\varepsilon^3}}+\frac{k}{\varepsilon \phi^4})

. We also give a two-sided error tester with smaller gap between

\phi^*

and

\phi

in the rotation map model that allows (neighbor, index) queries and degree queries.Comment: 23 pages; STACS 201

arXiv.org e-Print Archive

CiteSeerX

Dagstuhl Research Online Publication Server

Testing Cluster Structure of Graphs

Author: Fortunato S.
Goldreich O.
Ng A. Y.
Porter M. A.
Zhu Z. A.
Publication venue
Publication date: 13/04/2015
Field of study

We study the problem of recognizing the cluster structure of a graph in the framework of property testing in the bounded degree model. Given a parameter

\varepsilon

, a

d

-bounded degree graph is defined to be

(k, \phi)

-clusterable, if it can be partitioned into no more than

k

parts, such that the (inner) conductance of the induced subgraph on each part is at least

\phi

and the (outer) conductance of each part is at most

c_{d,k}\varepsilon^4\phi^2

, where

c_{d,k}

depends only on

d,k

. Our main result is a sublinear algorithm with the running time

\widetilde{O}(\sqrt{n}\cdot\mathrm{poly}(\phi,k,1/\varepsilon))

that takes as input a graph with maximum degree bounded by

d

, parameters

k

\phi

\varepsilon

, and with probability at least

\frac23

, accepts the graph if it is

(k,\phi)

-clusterable and rejects the graph if it is

\varepsilon

-far from

(k, \phi^*)

-clusterable for

\phi^* = c'_{d,k}\frac{\phi^2 \varepsilon^4}{\log n}

, where

c'_{d,k}

depends only on

d,k

. By the lower bound of

\Omega(\sqrt{n})

on the number of queries needed for testing graph expansion, which corresponds to

k=1

in our problem, our algorithm is asymptotically optimal up to polylogarithmic factors.Comment: Full version of STOC 201

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

White Rose Research Online

Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale

Author: Börner Katy
Emmons Scott
Gallant Mike
Kobourov Stephen
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 08/07/2016
Field of study

Notions of community quality underlie network clustering. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms -- Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on information recovery metrics. Our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it absolutely superior. Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

PubMed Central

The University of Arizona

Algorithmic and Statistical Perspectives on Large-Scale Data Analysis

Author: Mahoney Michael W.
Publication venue
Publication date: 08/10/2010
Field of study

In recent years, ideas from statistics and scientific computing have begun to interact in increasingly sophisticated and fruitful ways with ideas from computer science and the theory of algorithms to aid in the development of improved worst-case algorithms that are useful for large-scale scientific and Internet data analysis problems. In this chapter, I will describe two recent examples---one having to do with selecting good columns or features from a (DNA Single Nucleotide Polymorphism) data matrix, and the other having to do with selecting good clusters or communities from a data graph (representing a social or information network)---that drew on ideas from both areas and that may serve as a model for exploiting complementary algorithmic and statistical perspectives in order to solve applied large-scale data analysis problems.Comment: 33 pages. To appear in Uwe Naumann and Olaf Schenk, editors, "Combinatorial Scientific Computing," Chapman and Hall/CRC Press, 201

arXiv.org e-Print Archive

CiteSeerX