Search CORE

134 research outputs found

MapReduce is Good Enough? If All You Have is a Hammer, Throw Away Everything That's Not a Nail!

Author: Lin Jimmy
Publication venue
Publication date: 01/01/2012
Field of study

Hadoop is currently the large-scale data analysis "hammer" of choice, but there exist classes of algorithms that aren't "nails", in the sense that they are not particularly amenable to the MapReduce programming model. To address this, researchers have proposed MapReduce extensions or alternative programming models in which these algorithms can be elegantly expressed. This essay espouses a very different position: that MapReduce is "good enough", and that instead of trying to invent screwdrivers, we should simply get rid of everything that's not a nail. To be more specific, much discussion in the literature surrounds the fact that iterative algorithms are a poor fit for MapReduce: the simple solution is to find alternative non-iterative algorithms that solve the same problem. This essay captures my personal experiences as an academic researcher as well as a software engineer in a "real-world" production analytics environment. From this combined perspective I reflect on the current state and future of "big data" research

arXiv.org e-Print Archive

CiteSeerX

Scheduling MapReduce Jobs under Multi-Round Precedences

Author: AM Hariri
D Fotakis
DB Shmoys
FN Afrati
J Aspnes
JR Correa
LA Hall
M Mastrolilli
M Queyranne
M Queyranne
RL Graham
WL Eastman
Publication venue
Publication date: 16/02/2016
Field of study

We consider non-preemptive scheduling of MapReduce jobs with multiple tasks in the practical scenario where each job requires several map-reduce rounds. We seek to minimize the average weighted completion time and consider scheduling on identical and unrelated parallel processors. For identical processors, we present LP-based O(1)-approximation algorithms. For unrelated processors, the approximation ratio naturally depends on the maximum number of rounds of any job. Since the number of rounds per job in typical MapReduce algorithms is a small constant, our scheduling algorithms achieve a small approximation ratio in practice. For the single-round case, we substantially improve on previously best known approximation guarantees for both identical and unrelated processors. Moreover, we conduct an experimental analysis and compare the performance of our algorithms against a fast heuristic and a lower bound on the optimal solution, thus demonstrating their promising practical performance

arXiv.org e-Print Archive

Crossref

Parallelization of genetic algorithms using Hadoop Map/Reduce

Author: Keco Dino
Subasi Abdulhamit
Publication venue: International University of Sarajevo
Publication date: 24/10/2012
Field of study

In this paper we present parallel implementation of genetic algorithm using map/reduce programming paradigm. Hadoop implementation of map/reduce library is used for this purpose. We compare our implementation with implementation presented in [1]. These two implementations are compared in solving One Max (Bit counting) problem. The comparison criteria between implementations are fitness convergence, quality of final solution, algorithm scalability, and cloud resource utilization. Our model for parallelization of genetic algorithm shows better performances and fitness convergence than model presented in [1], but our model has lower quality of solution because of species problem

Inquiry (E-Journal - Faculty of Business and Administration, International University of Sarajevo)

Embed and Conquer: Scalable Embeddings for Kernel k-Means on MapReduce

Author: Elgohary Ahmed
Farahat Ahmed K.
Kamel Mohamed S.
Karray Fakhri
Publication venue
Publication date: 29/01/2014
Field of study

The kernel

k

-means is an effective method for data clustering which extends the commonly-used

k

-means algorithm to work on a similarity matrix over complex data structures. The kernel

k

-means algorithm is however computationally very complex as it requires the complete data matrix to be calculated and stored. Further, the kernelized nature of the kernel

k

-means algorithm hinders the parallelization of its computations on modern infrastructures for distributed computing. In this paper, we are defining a family of kernel-based low-dimensional embeddings that allows for scaling kernel

k

-means on MapReduce via an efficient and unified parallelization strategy. Afterwards, we propose two methods for low-dimensional embedding that adhere to our definition of the embedding family. Exploiting the proposed parallelization strategy, we present two scalable MapReduce algorithms for kernel

k

-means. We demonstrate the effectiveness and efficiency of the proposed algorithms through an empirical evaluation on benchmark data sets.Comment: Appears in Proceedings of the SIAM International Conference on Data Mining (SDM), 201

arXiv.org e-Print Archive

CiteSeerX

On an almost-universal hash function family with applications to authentication and secrecy codes

Author: Bibak Khodakhast
Kapron Bruce M.
Srinivasan Venkatesh
Tóth László
Publication venue
Publication date: 21/04/2017
Field of study

Universal hashing, discovered by Carter and Wegman in 1979, has many important applications in computer science. MMH

^*

, which was shown to be

\Delta

-universal by Halevi and Krawczyk in 1997, is a well-known universal hash function family. We introduce a variant of MMH

^*

, that we call GRDH, where we use an arbitrary integer

n>1

instead of prime

p

and let the keys

\mathbf{x}=\langle x_1, \ldots, x_k \rangle \in \mathbb{Z}_n^k

satisfy the conditions

\gcd(x_i,n)=t_i

(

1\leq i\leq k

), where

t_1,\ldots,t_k

are given positive divisors of

n

. Then via connecting the universal hashing problem to the number of solutions of restricted linear congruences, we prove that the family GRDH is an

\varepsilon

-almost-

\Delta

-universal family of hash functions for some

\varepsilon<1

if and only if

n

is odd and

\gcd(x_i,n)=t_i=1

(1\leq i\leq k)

. Furthermore, if these conditions are satisfied then GRDH is

\frac{1}{p-1}

-almost-

\Delta

-universal, where

p

is the smallest prime divisor of

n

. Finally, as an application of our results, we propose an authentication code with secrecy scheme which strongly generalizes the scheme studied by Alomair et al. [{\it J. Math. Cryptol.} {\bf 4} (2010), 121--148], and [{\it J.UCS} {\bf 15} (2009), 2937--2956].Comment: International Journal of Foundations of Computer Science, to appea

arXiv.org e-Print Archive

Cryptology ePrint Archive