53 research outputs found
Parallel Algorithms for Geometric Graph Problems
We give algorithms for geometric graph problems in the modern parallel models
inspired by MapReduce. For example, for the Minimum Spanning Tree (MST) problem
over a set of points in the two-dimensional space, our algorithm computes a
-approximate MST. Our algorithms work in a constant number of
rounds of communication, while using total space and communication proportional
to the size of the data (linear space and near linear time algorithms). In
contrast, for general graphs, achieving the same result for MST (or even
connectivity) remains a challenging open problem, despite drawing significant
attention in recent years.
We develop a general algorithmic framework that, besides MST, also applies to
Earth-Mover Distance (EMD) and the transportation cost problem. Our algorithmic
framework has implications beyond the MapReduce model. For example it yields a
new algorithm for computing EMD cost in the plane in near-linear time,
. We note that while recently Sharathkumar and Agarwal
developed a near-linear time algorithm for -approximating EMD,
our algorithm is fundamentally different, and, for example, also solves the
transportation (cost) problem, raised as an open question in their work.
Furthermore, our algorithm immediately gives a -approximation
algorithm with space in the streaming-with-sorting model with
passes. As such, it is tempting to conjecture that the
parallel models may also constitute a concrete playground in the quest for
efficient algorithms for EMD (and other similar problems) in the vanilla
streaming model, a well-known open problem
Algorithms and tools of big data: a bibliographic review
Trabajo de InvestigaciónBig data is among us, it’s present on all parts of our life and like time it’s never stops growing. Having in mind that all data will tend to become big data is necessary to change the customs ways of dealing for two simple reasons, the increase in the cost and the decrease of effectives. A quick glance in the evolution of data and information shows the leaders and the technologies that have been developing a way to work with big data, all this having as an essential factor the value given to the user.INTRODUCTION
1. PROBLEM STATEMENT
2. OBJECTIVES
2.1 GENERAL OBJECTIVE
2.2 SPECIFICS OBJECTIVES
3. BIG DATA
3.1 THEORETICAL
3.2 UNSTRUCTURED DATA
3.3 METAHEURISTIC4. META ANALYSIS
4.1 REASON FOR FULL-TEXT ARTICLES EXCLUDED
4.2 INFORMATION SOURCES
4.3 STUDY SELECTION
4.4 SYNTHESIS OF RESULTS
4.5 RESULTS
5. OPEN FIELDS OF RESEARCH
5.1 BIG DATA SECURITY5.1.1 Common Techniques for Securing Big Data.
5.1.2 Threats for Big Data.
5.2 BIG DATA INFRASTRUCTURE
5.3. BIG DATA FOR BUSINESS
6. CONCLUSIONS
REFERENCESPregradoIngeniero de Sistema
Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration
Computing optimal transport distances such as the earth mover's distance is a
fundamental problem in machine learning, statistics, and computer vision.
Despite the recent introduction of several algorithms with good empirical
performance, it is unknown whether general optimal transport distances can be
approximated in near-linear time. This paper demonstrates that this ambitious
goal is in fact achieved by Cuturi's Sinkhorn Distances. This result relies on
a new analysis of Sinkhorn iteration, which also directly suggests a new greedy
coordinate descent algorithm, Greenkhorn, with the same theoretical guarantees.
Numerical simulations illustrate that Greenkhorn significantly outperforms the
classical Sinkhorn algorithm in practice
Conditional Hardness of Earth Mover Distance
The Earth Mover Distance (EMD) between two sets of points A, B subseteq R^d with |A| = |B| is the minimum total Euclidean distance of any perfect matching between A and B. One of its generalizations is asymmetric EMD, which is the minimum total Euclidean distance of any matching of size |A| between sets of points A,B subseteq R^d with |A| <= |B|. The problems of computing EMD and asymmetric EMD are well-studied and have many applications in computer science, some of which also ask for the EMD-optimal matching itself. Unfortunately, all known algorithms require at least quadratic time to compute EMD exactly. Approximation algorithms with nearly linear time complexity in n are known (even for finding approximately optimal matchings), but suffer from exponential dependence on the dimension.
In this paper we show that significant improvements in exact and approximate algorithms for EMD would contradict conjectures in fine-grained complexity. In particular, we prove the following results:
- Under the Orthogonal Vectors Conjecture, there is some c>0 such that EMD in Omega(c^{log^* n}) dimensions cannot be computed in truly subquadratic time.
- Under the Hitting Set Conjecture, for every delta>0, no truly subquadratic time algorithm can find a (1 + 1/n^delta)-approximate EMD matching in omega(log n) dimensions.
- Under the Hitting Set Conjecture, for every eta = 1/omega(log n), no truly subquadratic time algorithm can find a (1 + eta)-approximate asymmetric EMD matching in omega(log n) dimensions
- …