20,606 research outputs found
Distributed Minimum Cut Approximation
We study the problem of computing approximate minimum edge cuts by
distributed algorithms. We use a standard synchronous message passing model
where in each round, bits can be transmitted over each edge (a.k.a.
the CONGEST model). We present a distributed algorithm that, for any weighted
graph and any , with high probability finds a cut of size
at most in
rounds, where is the size of the minimum cut. This algorithm is based
on a simple approach for analyzing random edge sampling, which we call the
random layering technique. In addition, we also present another distributed
algorithm, which is based on a centralized algorithm due to Matula [SODA '93],
that with high probability computes a cut of size at most
in rounds for any .
The time complexities of both of these algorithms almost match the
lower bound of Das Sarma et al. [STOC '11], thus
leading to an answer to an open question raised by Elkin [SIGACT-News '04] and
Das Sarma et al. [STOC '11].
Furthermore, we also strengthen the lower bound of Das Sarma et al. by
extending it to unweighted graphs. We show that the same lower bound also holds
for unweighted multigraphs (or equivalently for weighted graphs in which
bits can be transmitted in each round over an edge of weight ),
even if the diameter is . For unweighted simple graphs, we show
that even for networks of diameter , finding an -approximate minimum cut
in networks of edge connectivity or computing an
-approximation of the edge connectivity requires rounds
Algorithmic and Statistical Perspectives on Large-Scale Data Analysis
In recent years, ideas from statistics and scientific computing have begun to
interact in increasingly sophisticated and fruitful ways with ideas from
computer science and the theory of algorithms to aid in the development of
improved worst-case algorithms that are useful for large-scale scientific and
Internet data analysis problems. In this chapter, I will describe two recent
examples---one having to do with selecting good columns or features from a (DNA
Single Nucleotide Polymorphism) data matrix, and the other having to do with
selecting good clusters or communities from a data graph (representing a social
or information network)---that drew on ideas from both areas and that may serve
as a model for exploiting complementary algorithmic and statistical
perspectives in order to solve applied large-scale data analysis problems.Comment: 33 pages. To appear in Uwe Naumann and Olaf Schenk, editors,
"Combinatorial Scientific Computing," Chapman and Hall/CRC Press, 201
An Economic Analysis of Privacy Protection and Statistical Accuracy as Social Choices
Statistical agencies face a dual mandate to publish accurate statistics while protecting respondent privacy. Increasing privacy protection requires decreased accuracy. Recognizing this as a resource allocation problem, we propose an economic solution: operate where the marginal cost of increasing privacy equals the marginal benefit. Our model of production, from computer science, assumes data are published using an efficient differentially private algorithm. Optimal choice weighs the demand for accurate statistics against the demand for privacy. Examples from U.S. statistical programs show how our framework can guide decision-making. Further progress requires a better understanding of willingness-to-pay for privacy and statistical accuracy
- …