19,406 research outputs found
A Two-Sided Error Distributed Property Tester For Conductance
We study property testing in the distributed model and extend its setting from testing with one-sided error to testing with two-sided error. In particular, we develop a two-sided error property tester for general graphs with round complexity O(log(n) / (epsilon Phi^2)) in the CONGEST model, which accepts graphs with conductance Phi and rejects graphs that are epsilon-far from having conductance at least Phi^2 / 1000 with constant probability. Our main insight is that one can start poly(n) random walks from a few random vertices without violating the congestion and unite the results to obtain a consistent answer from all vertices. For connected graphs, this is even possible when the number of vertices is unknown. We also obtain a matching Omega(log n) lower bound for the LOCAL and CONGEST models by an indistinguishability argument. Although the power of vertex labels that arises from two-sided error might seem to be much stronger than in the sequential query model, we can show that this is not the case
Property testing of graphs and the role of neighborhood distributions
Property testing considers decision problems in the regime of sublinear complexity. Most classical decision problems require at least linear time complexity in order to read the whole input. Hence, decision problems are relaxed by introducing a gap between “yes” and “no” instances: A property tester for a property Π (e. g., planarity) is a randomized algorithm with constant error probability that accepts objects that have Π (planar graphs) and that rejects objects that have linear edit distance to any object from Π (graphs with a linear number of crossing edges in every planar embedding). For property testers, locality is a natural and crucial concept because they cannot obtain a global view of their input. In this thesis, we investigate property testing in graphs and how testers leverage the information contained in the neighborhoods of randomly sampled vertices: We provide some structural insights regarding properties with constant testing complexity in graphs with bounded (maximum vertex) degree and a connection between testers with constant complexity for general graphs and testers with logarithmic space complexity for random-order streams. We also present testers for some minor-freeness properties and a tester for conductance in the distributed CONGEST model
Testing Small Set Expansion in General Graphs
We consider the problem of testing small set expansion for general graphs. A
graph is a -expander if every subset of volume at most has
conductance at least . Small set expansion has recently received
significant attention due to its close connection to the unique games
conjecture, the local graph partitioning algorithms and locally testable codes.
We give testers with two-sided error and one-sided error in the adjacency
list model that allows degree and neighbor queries to the oracle of the input
graph. The testers take as input an -vertex graph , a volume bound ,
an expansion bound and a distance parameter . For the
two-sided error tester, with probability at least , it accepts the graph
if it is a -expander and rejects the graph if it is -far
from any -expander, where and
. The
query complexity and running time of the tester are
, where is the number of
edges of the graph. For the one-sided error tester, it accepts every
-expander, and with probability at least , rejects every graph
that is -far from -expander, where
and for any . The query
complexity and running time of this tester are
.
We also give a two-sided error tester with smaller gap between and
in the rotation map model that allows (neighbor, index) queries and
degree queries.Comment: 23 pages; STACS 201
Testing Cluster Structure of Graphs
We study the problem of recognizing the cluster structure of a graph in the
framework of property testing in the bounded degree model. Given a parameter
, a -bounded degree graph is defined to be -clusterable, if it can be partitioned into no more than parts, such
that the (inner) conductance of the induced subgraph on each part is at least
and the (outer) conductance of each part is at most
, where depends only on . Our main
result is a sublinear algorithm with the running time
that takes as
input a graph with maximum degree bounded by , parameters , ,
, and with probability at least , accepts the graph if it
is -clusterable and rejects the graph if it is -far from
-clusterable for , where depends only on . By the lower
bound of on the number of queries needed for testing graph
expansion, which corresponds to in our problem, our algorithm is
asymptotically optimal up to polylogarithmic factors.Comment: Full version of STOC 201
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale
Notions of community quality underlie network clustering. While studies
surrounding network clustering are increasingly common, a precise understanding
of the realtionship between different cluster quality metrics is unknown. In
this paper, we examine the relationship between stand-alone cluster quality
metrics and information recovery metrics through a rigorous analysis of four
widely-used network clustering algorithms -- Louvain, Infomap, label
propagation, and smart local moving. We consider the stand-alone quality
metrics of modularity, conductance, and coverage, and we consider the
information recovery metrics of adjusted Rand score, normalized mutual
information, and a variant of normalized mutual information used in previous
work. Our study includes both synthetic graphs and empirical data sets of sizes
varying from 1,000 to 1,000,000 nodes.
We find significant differences among the results of the different cluster
quality metrics. For example, clustering algorithms can return a value of 0.4
out of 1 on modularity but score 0 out of 1 on information recovery. We find
conductance, though imperfect, to be the stand-alone quality metric that best
indicates performance on information recovery metrics. Our study shows that the
variant of normalized mutual information used in previous work cannot be
assumed to differ only slightly from traditional normalized mutual information.
Smart local moving is the best performing algorithm in our study, but
discrepancies between cluster evaluation metrics prevent us from declaring it
absolutely superior. Louvain performed better than Infomap in nearly all the
tests in our study, contradicting the results of previous work in which Infomap
was superior to Louvain. We find that although label propagation performs
poorly when clusters are less clearly defined, it scales efficiently and
accurately to large graphs with well-defined clusters
Algorithmic and Statistical Perspectives on Large-Scale Data Analysis
In recent years, ideas from statistics and scientific computing have begun to
interact in increasingly sophisticated and fruitful ways with ideas from
computer science and the theory of algorithms to aid in the development of
improved worst-case algorithms that are useful for large-scale scientific and
Internet data analysis problems. In this chapter, I will describe two recent
examples---one having to do with selecting good columns or features from a (DNA
Single Nucleotide Polymorphism) data matrix, and the other having to do with
selecting good clusters or communities from a data graph (representing a social
or information network)---that drew on ideas from both areas and that may serve
as a model for exploiting complementary algorithmic and statistical
perspectives in order to solve applied large-scale data analysis problems.Comment: 33 pages. To appear in Uwe Naumann and Olaf Schenk, editors,
"Combinatorial Scientific Computing," Chapman and Hall/CRC Press, 201
- …