15,616 research outputs found

    Parallel Algorithms for Geometric Graph Problems

    Full text link
    We give algorithms for geometric graph problems in the modern parallel models inspired by MapReduce. For example, for the Minimum Spanning Tree (MST) problem over a set of points in the two-dimensional space, our algorithm computes a (1+ϵ)(1+\epsilon)-approximate MST. Our algorithms work in a constant number of rounds of communication, while using total space and communication proportional to the size of the data (linear space and near linear time algorithms). In contrast, for general graphs, achieving the same result for MST (or even connectivity) remains a challenging open problem, despite drawing significant attention in recent years. We develop a general algorithmic framework that, besides MST, also applies to Earth-Mover Distance (EMD) and the transportation cost problem. Our algorithmic framework has implications beyond the MapReduce model. For example it yields a new algorithm for computing EMD cost in the plane in near-linear time, n1+oϵ(1)n^{1+o_\epsilon(1)}. We note that while recently Sharathkumar and Agarwal developed a near-linear time algorithm for (1+ϵ)(1+\epsilon)-approximating EMD, our algorithm is fundamentally different, and, for example, also solves the transportation (cost) problem, raised as an open question in their work. Furthermore, our algorithm immediately gives a (1+ϵ)(1+\epsilon)-approximation algorithm with nδn^{\delta} space in the streaming-with-sorting model with 1/δO(1)1/\delta^{O(1)} passes. As such, it is tempting to conjecture that the parallel models may also constitute a concrete playground in the quest for efficient algorithms for EMD (and other similar problems) in the vanilla streaming model, a well-known open problem

    Deterministic Sampling and Range Counting in Geometric Data Streams

    Get PDF
    We present memory-efficient deterministic algorithms for constructing epsilon-nets and epsilon-approximations of streams of geometric data. Unlike probabilistic approaches, these deterministic samples provide guaranteed bounds on their approximation factors. We show how our deterministic samples can be used to answer approximate online iceberg geometric queries on data streams. We use these techniques to approximate several robust statistics of geometric data streams, including Tukey depth, simplicial depth, regression depth, the Thiel-Sen estimator, and the least median of squares. Our algorithms use only a polylogarithmic amount of memory, provided the desired approximation factors are inverse-polylogarithmic. We also include a lower bound for non-iceberg geometric queries.Comment: 12 pages, 1 figur

    Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms

    Get PDF
    We present a technical survey on the state of the art approaches in data reduction and the coreset framework. These include geometric decompositions, gradient methods, random sampling, sketching and random projections. We further outline their importance for the design of streaming algorithms and give a brief overview on lower bounding techniques

    Towards Tight Bounds for the Streaming Set Cover Problem

    Full text link
    We consider the classic Set Cover problem in the data stream model. For nn elements and mm sets (m≥nm\geq n) we give a O(1/δ)O(1/\delta)-pass algorithm with a strongly sub-linear O~(mnδ)\tilde{O}(mn^{\delta}) space and logarithmic approximation factor. This yields a significant improvement over the earlier algorithm of Demaine et al. [DIMV14] that uses exponentially larger number of passes. We complement this result by showing that the tradeoff between the number of passes and space exhibited by our algorithm is tight, at least when the approximation factor is equal to 11. Specifically, we show that any algorithm that computes set cover exactly using (12δ−1)({1 \over 2\delta}-1) passes must use Ω~(mnδ)\tilde{\Omega}(mn^{\delta}) space in the regime of m=O(n)m=O(n). Furthermore, we consider the problem in the geometric setting where the elements are points in R2\mathbb{R}^2 and sets are either discs, axis-parallel rectangles, or fat triangles in the plane, and show that our algorithm (with a slight modification) uses the optimal O~(n)\tilde{O}(n) space to find a logarithmic approximation in O(1/δ)O(1/\delta) passes. Finally, we show that any randomized one-pass algorithm that distinguishes between covers of size 2 and 3 must use a linear (i.e., Ω(mn)\Omega(mn)) amount of space. This is the first result showing that a randomized, approximate algorithm cannot achieve a space bound that is sublinear in the input size. This indicates that using multiple passes might be necessary in order to achieve sub-linear space bounds for this problem while guaranteeing small approximation factors.Comment: A preliminary version of this paper is to appear in PODS 201
    • …
    corecore