20,606 research outputs found

    Distributed Minimum Cut Approximation

    Full text link
    We study the problem of computing approximate minimum edge cuts by distributed algorithms. We use a standard synchronous message passing model where in each round, O(logn)O(\log n) bits can be transmitted over each edge (a.k.a. the CONGEST model). We present a distributed algorithm that, for any weighted graph and any ϵ(0,1)\epsilon \in (0, 1), with high probability finds a cut of size at most O(ϵ1λ)O(\epsilon^{-1}\lambda) in O(D)+O~(n1/2+ϵ)O(D) + \tilde{O}(n^{1/2 + \epsilon}) rounds, where λ\lambda is the size of the minimum cut. This algorithm is based on a simple approach for analyzing random edge sampling, which we call the random layering technique. In addition, we also present another distributed algorithm, which is based on a centralized algorithm due to Matula [SODA '93], that with high probability computes a cut of size at most (2+ϵ)λ(2+\epsilon)\lambda in O~((D+n)/ϵ5)\tilde{O}((D+\sqrt{n})/\epsilon^5) rounds for any ϵ>0\epsilon>0. The time complexities of both of these algorithms almost match the Ω~(D+n)\tilde{\Omega}(D + \sqrt{n}) lower bound of Das Sarma et al. [STOC '11], thus leading to an answer to an open question raised by Elkin [SIGACT-News '04] and Das Sarma et al. [STOC '11]. Furthermore, we also strengthen the lower bound of Das Sarma et al. by extending it to unweighted graphs. We show that the same lower bound also holds for unweighted multigraphs (or equivalently for weighted graphs in which O(wlogn)O(w\log n) bits can be transmitted in each round over an edge of weight ww), even if the diameter is D=O(logn)D=O(\log n). For unweighted simple graphs, we show that even for networks of diameter O~(1λnαλ)\tilde{O}(\frac{1}{\lambda}\cdot \sqrt{\frac{n}{\alpha\lambda}}), finding an α\alpha-approximate minimum cut in networks of edge connectivity λ\lambda or computing an α\alpha-approximation of the edge connectivity requires Ω~(D+nαλ)\tilde{\Omega}(D + \sqrt{\frac{n}{\alpha\lambda}}) rounds

    Algorithmic and Statistical Perspectives on Large-Scale Data Analysis

    Full text link
    In recent years, ideas from statistics and scientific computing have begun to interact in increasingly sophisticated and fruitful ways with ideas from computer science and the theory of algorithms to aid in the development of improved worst-case algorithms that are useful for large-scale scientific and Internet data analysis problems. In this chapter, I will describe two recent examples---one having to do with selecting good columns or features from a (DNA Single Nucleotide Polymorphism) data matrix, and the other having to do with selecting good clusters or communities from a data graph (representing a social or information network)---that drew on ideas from both areas and that may serve as a model for exploiting complementary algorithmic and statistical perspectives in order to solve applied large-scale data analysis problems.Comment: 33 pages. To appear in Uwe Naumann and Olaf Schenk, editors, "Combinatorial Scientific Computing," Chapman and Hall/CRC Press, 201

    An Economic Analysis of Privacy Protection and Statistical Accuracy as Social Choices

    Get PDF
    Statistical agencies face a dual mandate to publish accurate statistics while protecting respondent privacy. Increasing privacy protection requires decreased accuracy. Recognizing this as a resource allocation problem, we propose an economic solution: operate where the marginal cost of increasing privacy equals the marginal benefit. Our model of production, from computer science, assumes data are published using an efficient differentially private algorithm. Optimal choice weighs the demand for accurate statistics against the demand for privacy. Examples from U.S. statistical programs show how our framework can guide decision-making. Further progress requires a better understanding of willingness-to-pay for privacy and statistical accuracy
    corecore