53 research outputs found

    Parallel Algorithms for Geometric Graph Problems

    Full text link
    We give algorithms for geometric graph problems in the modern parallel models inspired by MapReduce. For example, for the Minimum Spanning Tree (MST) problem over a set of points in the two-dimensional space, our algorithm computes a (1+ϵ)(1+\epsilon)-approximate MST. Our algorithms work in a constant number of rounds of communication, while using total space and communication proportional to the size of the data (linear space and near linear time algorithms). In contrast, for general graphs, achieving the same result for MST (or even connectivity) remains a challenging open problem, despite drawing significant attention in recent years. We develop a general algorithmic framework that, besides MST, also applies to Earth-Mover Distance (EMD) and the transportation cost problem. Our algorithmic framework has implications beyond the MapReduce model. For example it yields a new algorithm for computing EMD cost in the plane in near-linear time, n1+oϵ(1)n^{1+o_\epsilon(1)}. We note that while recently Sharathkumar and Agarwal developed a near-linear time algorithm for (1+ϵ)(1+\epsilon)-approximating EMD, our algorithm is fundamentally different, and, for example, also solves the transportation (cost) problem, raised as an open question in their work. Furthermore, our algorithm immediately gives a (1+ϵ)(1+\epsilon)-approximation algorithm with nδn^{\delta} space in the streaming-with-sorting model with 1/δO(1)1/\delta^{O(1)} passes. As such, it is tempting to conjecture that the parallel models may also constitute a concrete playground in the quest for efficient algorithms for EMD (and other similar problems) in the vanilla streaming model, a well-known open problem

    Algorithms and tools of big data: a bibliographic review

    Get PDF
    Trabajo de InvestigaciónBig data is among us, it’s present on all parts of our life and like time it’s never stops growing. Having in mind that all data will tend to become big data is necessary to change the customs ways of dealing for two simple reasons, the increase in the cost and the decrease of effectives. A quick glance in the evolution of data and information shows the leaders and the technologies that have been developing a way to work with big data, all this having as an essential factor the value given to the user.INTRODUCTION 1. PROBLEM STATEMENT 2. OBJECTIVES 2.1 GENERAL OBJECTIVE 2.2 SPECIFICS OBJECTIVES 3. BIG DATA 3.1 THEORETICAL 3.2 UNSTRUCTURED DATA 3.3 METAHEURISTIC4. META ANALYSIS 4.1 REASON FOR FULL-TEXT ARTICLES EXCLUDED 4.2 INFORMATION SOURCES 4.3 STUDY SELECTION 4.4 SYNTHESIS OF RESULTS 4.5 RESULTS 5. OPEN FIELDS OF RESEARCH 5.1 BIG DATA SECURITY5.1.1 Common Techniques for Securing Big Data. 5.1.2 Threats for Big Data. 5.2 BIG DATA INFRASTRUCTURE 5.3. BIG DATA FOR BUSINESS 6. CONCLUSIONS REFERENCESPregradoIngeniero de Sistema

    Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration

    Get PDF
    Computing optimal transport distances such as the earth mover's distance is a fundamental problem in machine learning, statistics, and computer vision. Despite the recent introduction of several algorithms with good empirical performance, it is unknown whether general optimal transport distances can be approximated in near-linear time. This paper demonstrates that this ambitious goal is in fact achieved by Cuturi's Sinkhorn Distances. This result relies on a new analysis of Sinkhorn iteration, which also directly suggests a new greedy coordinate descent algorithm, Greenkhorn, with the same theoretical guarantees. Numerical simulations illustrate that Greenkhorn significantly outperforms the classical Sinkhorn algorithm in practice

    Conditional Hardness of Earth Mover Distance

    Get PDF
    The Earth Mover Distance (EMD) between two sets of points A, B subseteq R^d with |A| = |B| is the minimum total Euclidean distance of any perfect matching between A and B. One of its generalizations is asymmetric EMD, which is the minimum total Euclidean distance of any matching of size |A| between sets of points A,B subseteq R^d with |A| <= |B|. The problems of computing EMD and asymmetric EMD are well-studied and have many applications in computer science, some of which also ask for the EMD-optimal matching itself. Unfortunately, all known algorithms require at least quadratic time to compute EMD exactly. Approximation algorithms with nearly linear time complexity in n are known (even for finding approximately optimal matchings), but suffer from exponential dependence on the dimension. In this paper we show that significant improvements in exact and approximate algorithms for EMD would contradict conjectures in fine-grained complexity. In particular, we prove the following results: - Under the Orthogonal Vectors Conjecture, there is some c>0 such that EMD in Omega(c^{log^* n}) dimensions cannot be computed in truly subquadratic time. - Under the Hitting Set Conjecture, for every delta>0, no truly subquadratic time algorithm can find a (1 + 1/n^delta)-approximate EMD matching in omega(log n) dimensions. - Under the Hitting Set Conjecture, for every eta = 1/omega(log n), no truly subquadratic time algorithm can find a (1 + eta)-approximate asymmetric EMD matching in omega(log n) dimensions
    • …
    corecore