231,865 research outputs found
On sampling nodes in a network
Random walk is an important tool in many graph mining applications including estimating graph parameters, sampling portions of the graph, and extracting dense communities. In this paper we consider the problem of sampling nodes from a large graph according to a prescribed distribution by using random walk as the basic primitive. Our goal is to obtain algorithms that make a small number of queries to the graph but output a node that is sampled according to the prescribed distribution. Focusing on the uniform distribution case, we study the query complexity of three algorithms and show a near-tight bound expressed in terms of the parameters of the graph such as average degree and the mixing time. Both theoretically and empirically, we show that some algorithms are preferable in practice than the others. We also extend our study to the problem of sampling nodes according to some polynomial function of their degrees; this has implications for designing efficient algorithms for applications such as triangle counting
Erasure-Resilient Sublinear-Time Graph Algorithms
We investigate sublinear-time algorithms that take partially erased graphs represented by adjacency lists as input. Our algorithms make degree and neighbor queries to the input graph and work with a specified fraction of adversarial erasures in adjacency entries. We focus on two computational tasks: testing if a graph is connected or ?-far from connected and estimating the average degree. For testing connectedness, we discover a threshold phenomenon: when the fraction of erasures is less than ?, this property can be tested efficiently (in time independent of the size of the graph); when the fraction of erasures is at least ?, then a number of queries linear in the size of the graph representation is required. Our erasure-resilient algorithm (for the special case with no erasures) is an improvement over the previously known algorithm for connectedness in the standard property testing model and has optimal dependence on the proximity parameter ?. For estimating the average degree, our results provide an "interpolation" between the query complexity for this computational task in the model with no erasures in two different settings: with only degree queries, investigated by Feige (SIAM J. Comput. `06), and with degree queries and neighbor queries, investigated by Goldreich and Ron (Random Struct. Algorithms `08) and Eden et al. (ICALP `17). We conclude with a discussion of our model and open questions raised by our work
On the Complexity of Sampling Vertices Uniformly from a Graph
We study a number of graph exploration problems in the following natural scenario: an algorithm starts exploring an undirected graph from some seed vertex; the algorithm, for an arbitrary vertex v that it is aware of, can ask an oracle to return the set of the neighbors of v. (In the case of social networks, a call to this oracle corresponds to downloading the profile page of user v.) The goal of the algorithm is to either learn something (e.g., average degree) about the graph, or to return some random function of the graph (e.g., a uniform-at-random vertex), while accessing/downloading as few vertices of the graph as possible.
Motivated by practical applications, we study the complexities of a variety of problems in terms of the graph\u27s mixing time t_{mix} and average degree d_{avg} - two measures that are believed to be quite small in real-world social networks, and that have often been used in the applied literature to bound the performance of online exploration algorithms.
Our main result is that the algorithm has to access Omega (t_{mix} d_{avg} epsilon^{-2} ln delta^{-1}) vertices to obtain, with probability at least 1-delta, an epsilon additive approximation of the average of a bounded function on the vertices of a graph - this lower bound matches the performance of an algorithm that was proposed in the literature.
We also give tight bounds for the problem of returning a close-to-uniform-at-random vertex from the graph. Finally, we give lower bounds for the problems of estimating the average degree of the graph, and the number of vertices of the graph
Quantum algorithms for connectivity and related problems
An important family of span programs, st-connectivity span programs, have been used to design quantum algorithms in various contexts, including a number of graph problems and formula evaluation problems. The complexity of the resulting algorithms depends on the largest positive witness size of any 1-input, and the largest negative witness size of any 0-input. Belovs and Reichardt first showed that the positive witness size is exactly characterized by the effective resistance of the input graph, but only rough upper bounds were known previously on the negative witness size. We show that the negative witness size in an st-connectivity span program is exactly characterized by the capacitance of the input graph. This gives a tight analysis for algorithms based on st-connectivity span programs on any set of inputs. We use this analysis to give a new quantum algorithm for estimating the capacitance of a graph. We also describe a new quantum algorithm for deciding if a graph is connected, which improves the previous best quantum algorithm for this problem if we're promised that either the graph has at least k > 1 components, or the graph is connected and has small average resistance, which is upper bounded by the diameter. We also give an alternative algorithm for deciding if a graph is connected that can be better than our first algorithm when the maximum degree is small. Finally, using ideas from our second connectivity algorithm, we give an algorithm for estimating the algebraic connectivity of a graph, the second largest eigenvalue of the Laplacian
Exploiting Neighborhood Interference with Low Order Interactions under Unit Randomized Design
Network interference, where the outcome of an individual is affected by the
treatment assignment of those in their social network, is pervasive in
real-world settings. However, it poses a challenge to estimating causal
effects. We consider the task of estimating the total treatment effect (TTE),
or the difference between the average outcomes of the population when everyone
is treated versus when no one is, under network interference. Under a Bernoulli
randomized design, we provide an unbiased estimator for the TTE when network
interference effects are constrained to low order interactions among neighbors
of an individual. We make no assumptions on the graph other than bounded
degree, allowing for well-connected networks that may not be easily clustered.
We derive a bound on the variance of our estimator and show in simulated
experiments that it performs well compared with standard estimators for the
TTE. We also derive a minimax lower bound on the mean squared error of our
estimator which suggests that the difficulty of estimation can be characterized
by the degree of interactions in the potential outcomes model. We also prove
that our estimator is asymptotically normal under boundedness conditions on the
network degree and potential outcomes model. Central to our contribution is a
new framework for balancing model flexibility and statistical complexity as
captured by this low order interactions structure.Comment: 42 pages including citations and appendix, 2 figures (total of 12
subfigures
Sublinear-Time Algorithms for Monomer-Dimer Systems on Bounded Degree Graphs
For a graph , let be the partition function of the
monomer-dimer system defined by , where is the
number of matchings of size in . We consider graphs of bounded degree
and develop a sublinear-time algorithm for estimating at an
arbitrary value within additive error with high
probability. The query complexity of our algorithm does not depend on the size
of and is polynomial in , and we also provide a lower bound
quadratic in for this problem. This is the first analysis of a
sublinear-time approximation algorithm for a # P-complete problem. Our
approach is based on the correlation decay of the Gibbs distribution associated
with . We show that our algorithm approximates the probability
for a vertex to be covered by a matching, sampled according to this Gibbs
distribution, in a near-optimal sublinear time. We extend our results to
approximate the average size and the entropy of such a matching within an
additive error with high probability, where again the query complexity is
polynomial in and the lower bound is quadratic in .
Our algorithms are simple to implement and of practical use when dealing with
massive datasets. Our results extend to other systems where the correlation
decay is known to hold as for the independent set problem up to the critical
activity
How to Make Your Approximation Algorithm Private: A Black-Box Differentially-Private Transformation for Tunable Approximation Algorithms of Functions with Low Sensitivity
We develop a framework for efficiently transforming certain approximation
algorithms into differentially-private variants, in a black-box manner. Our
results focus on algorithms A that output an approximation to a function f of
the form , where 0<=a <1 is a parameter
that can be``tuned" to small-enough values while incurring only a poly blowup
in the running time/space. We show that such algorithms can be made DP without
sacrificing accuracy, as long as the function f has small global sensitivity.
We achieve these results by applying the smooth sensitivity framework developed
by Nissim, Raskhodnikova, and Smith (STOC 2007).
Our framework naturally applies to transform non-private FPRAS (resp. FPTAS)
algorithms into -DP (resp. -DP) approximation
algorithms. We apply our framework in the context of sublinear-time and
sublinear-space algorithms, while preserving the nature of the algorithm in
meaningful ranges of the parameters. Our results include the first (to the best
of our knowledge) -edge DP sublinear-time algorithm for
estimating the number of triangles, the number of connected components, and the
weight of a MST of a graph, as well as a more efficient algorithm (while
sacrificing pure DP in contrast to previous results) for estimating the average
degree of a graph. In the area of streaming algorithms, our results include
-DP algorithms for estimating L_p-norms, distinct elements,
and weighted MST for both insertion-only and turnstile streams. Our
transformation also provides a private version of the smooth histogram
framework, which is commonly used for converting streaming algorithms into
sliding window variants, and achieves a multiplicative approximation to many
problems, such as estimating L_p-norms, distinct elements, and the length of
the longest increasing subsequence
- …