An Efficient Platform for Large-Scale MapReduce Processing

Abstract

In this thesis we proposed and implemented the MMR, a new and open-source MapRe- duce model with MPI for parallel and distributed programing. MMR combines Pthreads, MPI and the Google\u27s MapReduce processing model to support multi-threaded as well as dis- tributed parallelism. Experiments show that our model signi cantly outperforms the leading open-source solution, Hadoop. It demonstrates linear scaling for CPU-intensive processing and even super-linear scaling for indexing-related workloads. In addition, we designed a MMR live DVD which facilitates the automatic installation and con guration of a Linux cluster with integrated MMR library which enables the development and execution of MMR applications

    Similar works